Difference between revisions of "Crunchbase 2013 Snapshot"

From edegan.com
Jump to navigation Jump to search
Line 14: Line 14:
 
==Retrieval==
 
==Retrieval==
  
The data was retrieved by Shrey and Matthew - STATE HOW AND FROM WHERE
+
The data was retrieved by Shrey and Matthew through an application from the Crunchbase Website for the API service. The data took about a month to come in due to a lack of response from Crunchbase itself. Eventually, they gave us basic access.
  
 
==Content==
 
==Content==

Revision as of 16:24, 4 April 2017

Original Email

Thank you for submitting a request for Research Access to Crunchbase through our API. We have reviewed your request, and granted you Basic Access. You can now access Crunchbase data in the following ways.

Check out the Open Data Map Explore the 2013 Snapshot Visit our website for instructions on accessing Crunchbase data. To access the REST API, you'll need your user key:

6d382e4bbdaa297138f32a588b139f53


With Basic Access, API use is limited to the Open Data Map and 2013 Snapshot. Access to the full API and latest funding round data requires a license. To learn more check out our offerings.

Retrieval

The data was retrieved by Shrey and Matthew through an application from the Crunchbase Website for the API service. The data took about a month to come in due to a lack of response from Crunchbase itself. Eventually, they gave us basic access.

Content

The snapshot contained 2 .tar.qz files, which were extracted into 181/crunchbase using the command

tar -zxvf file.tar.gz

The csv files (organizations.csv and people.csv) were copied for access to:

E:\McNair\Projects\Accelerators\Crunchbase Snapshot

The files (size in bytes) and their contents are

crunchbase_2013_snapshot_mysql.tar.gz

  • license.txt 526
  • cb_objects.sql 338955612
  • cb_offices.sql 14850092
  • cb_people.sql 13253952
  • cb_ipos.sql 178397
  • cb_milestones.sql 10498840
  • cb_funds.sql 385010
  • cb_relationships.sql 48655529
  • cb_degrees.sql 13829471
  • cb_investments.sql 6185134
  • cb_acquisitions.sql 2309393
  • cb_funding_rounds.sql 14681705

odm.csv.tar.gz

  • organizations.csv 212013301
    • 459916 records with the following fields:
      • crunchbase_uuid
      • type
      • primary_role
      • name
      • crunchbase_url
      • homepage_domain
      • homepage_url
      • profile_image_url
      • facebook_url
      • twitter_url
      • linkedin_url
      • stock_symbol
      • location_city
      • location_region
      • location_country_code
      • short_description
  • people.csv 188924229
    • 521634 records with the following fields:
      • crunchbase_uuid
      • type
      • first_name
      • last_name
      • crunchbase_url
      • profile_image_url
      • facebook_url
      • twitter_url
      • linkedin_url
      • location_city
      • location_region
      • location_country_code
      • title
      • organization
      • organization_crunchbase_url
  • crunchbase_license.txt 487

Changing MYSQL to PostgreSQL

The SQL files were generated in MySQL. We need to convert them to PostgreSQL. See: https://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL and http://stackoverflow.com/questions/1942586/comparison-of-database-column-types-in-mysql-postgresql-and-sqlite-cross-map

The key changes are:

MYSQL          POSTGRESQL
-----          ----------
LOCK           --comment out as no need but LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]
UNLOCK         --comment out
decimal(x,y)   real (might work as is)
datetime       timestamp
KEY            --comment out as no need but FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ]