Changes

Jump to navigation Jump to search
162 bytes removed ,  13:51, 29 March 2019
no edit summary
data\people_descriptions.csv
The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary. To keep track of the data type from each csv file used to copy to sql tablesthe SQL database, a file get_type.py is included in E:\projects\crunchbase3. This python script will print the first 5 rows of every data frame in the current directory.
All the crunchbase3 data from drive E are now also in drive Z:/crunchbase3
Since the data will be changing a lot compared to previous years, using \i load_crunchbase.sql might not very useful, and one may need to copy one table at a time by pasting the sql script into the terminal.
All the dataset (17 of them) from the API has have been copied to the PostgreSQL server in drive Z under /bulk/crunchbase3. To make date-time format in postgres works work properly, all the empty string with quotes ("") in CSV files have been replaced by NULL with the command line
sed 's/""//g' file.csv >file_clean.csv
The script that I used to do that is in the file clean_data.sh in E:/projects/crunchbase3. A shorter script to do that for all the files in the directory is possible but might not be necessary and not all files require such edit.
82

edits

Navigation menu