Changes

Crunchbase Database (view source)

Revision as of 13:51, 29 March 2019

162 bytes removed , 13:51, 29 March 2019

no edit summary

data\people_descriptions.csv

~~The sql script get_data.sql from last year is copied to the current Crunchbase3 directory. However, two databases are very different now and adjustments are necessary.~~ To keep track of the data type from each csv file used to copy to ~~sql tables~~the SQL database, a file get_type.py is included in E:\projects\crunchbase3. This python script will print the first 5 rows of every data frame in the current directory.

All the crunchbase3 data from drive E are now also in drive Z:/crunchbase3

Since the data will be changing a lot compared to previous years, using \i load_crunchbase.sql might not very useful, and one may need to copy one table at a time by pasting the sql script into the terminal.

All the dataset (17 of them) from the API ~~has~~ have been copied to the PostgreSQL server in drive Z under /bulk/crunchbase3. To make date-time format in postgres ~~works~~ work properly, all the empty string with quotes ("") in CSV files have been replaced by NULL with the command line

sed 's/""//g' file.csv >file_clean.csv

The script that I used to do that is in the file clean_data.sh in E:/projects/crunchbase3. A shorter script to do that for all the files in the directory is possible but might not be necessary and not all files require such edit.

Hiep

82

edits

Changes

Crunchbase Database (view source)

Revision as of 13:51, 29 March 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools