Changes

Jump to navigation Jump to search
809 bytes added ,  12:11, 23 September 2019
no edit summary
USSDCRound1980 was updated to remove fields that should have been in USVCPortCos1980 only. When normalizing be sure to only copy down key fields. USMAPrivate100pc1985 was updated to reflect the MAs load in LoadingScriptsV1. There wasn't a good original. We are using 1985 forward as there are data issues that prevent download/extraction for the 1980-1984 data. Year completed was added as a check variable but might have been the source of issues and so was removed. Date Effective can be used instead. And USIPOComp1980 was updated to allow all exchanges (not just NNA). I couldn't require completion in the search, so that will have to be done in the dbase. USVCFund1980 was updated because some variables -- those concerned with the fund's name and fund address -- had changed name. Finally, note that USPortCoLongDesc1980 needs processing separately.
 
===Long Description===
 
The instructions on [[Retrieving_US_VC_Data_From_SDC#Scripts_and_other_info]] were modified as follows:
#Remove the header and footer, and then save as Process.txt using UNIX line endings and UTF-8 encoding.
#Run the Regex process (note that I modified it slightly)
#Manual Clean
#Remove quotes ",',`
#Put in a new header with a very long description column
#Run the normalizer
#Remove duplicate spaces from the description column by pushing the data through excel and running the last regex (save as In5.txt with UNIX/UTF-8)
 
cat Process.txt | perl -pe 's/^([^ ])/###\1/g' > Out1.txt
cat Out1.txt | perl -pe 's/\s{65,}/ /g' > Out2.txt
cat Out2.txt | perl -pe 's/\n//g' > Out3.txt
cat Out3.txt | perl -pe 's/###/\n/g' > Out4.txt
 
cat In5.txt | perl -pe 's/\s{2,}/ /g' > Out4.txt

Navigation menu