Changes

Jump to navigation Jump to search
928 bytes added ,  16:24, 28 March 2019
no edit summary
organizations.csv
people.csv
 
03/28/2019 UPDATE
 
All the dataset from the API has been copied to the PostgreSQL server in drive Z under /bulk/crunchbase3. To make date-time format in postgres works properly, all the empty string with quotes ("") in CSV files have been replaced by NULL with the command line
sed 's/""//g' file_clean.csv >file_clean.csv
The script that I used to do that is in the file clean_data.sh in E:/projects/crunchbase3
All the scripts in load_crunchbase.sql have been updated. It now includes the correct number of rows copied from the csv files. I have also double-checked each table by comparing the postgres version of the data and the pandas version of the data.
 
To see the data in the postgres:
1) Connect to reseacher@199.188.177.215. A password is required ( ask Prof Egan for details)
 
2) Go to /bulk/crunchbase3
cd /bulk/crunchbase3
3) Connect to the database
psql crunchbase3
\dt
4) Perform regular SQL queries
82

edits

Navigation menu