This page details the process of merging existing data with data pulled from Crunchbase.
The data from Crunchbase, organized into tables, is in a script found at:
===Step One: Creating UUID Matches===
my making sure our company names were unique; creating a 1-1-1-1 relationship (only one instance of a company name in our data, and in Crunchbase data). We did so using the Matcher. We matched our sheet against itself, and Crunchbase info against itself, to remove duplicates and only leave unique values.
Upon Ed's instruction, we then looked at companies ''in Crunchbase'' which had more than one UUID associated with the company name. Of the 670,000 companies in Crunchbase, only 15,000 had duplicate UUIDs. From this list of 15,000, we used recursive filtering to determine if any companies could be properly matched to the company in our data by looking at additional variables (such as company location).
Upon refining our list based on recursive filtering, we found
__ companies which match our data, and added UUIDs appropriately.
===Step Two: Pulling Data===
then pulled the relevant data from Crunchbase based on unique UUID matches. In the crunchbase2 database, we used the table ''organizations''. The table looks like this:
DROP TABLE organizations;CREATE TABLE organizations (
We also want to get more information on organization descriptions. To do so, we can pull ''description'' from the table ''organization_descriptions'', matching based on UUID.
Note: we may also be able to merge some combination of category_list, category_group_list, and (from category_groups table) category_name, to merge with cosector in our data, and use it for [[Maxine Tao]]'s industry classifier.