Changes

Jump to navigation Jump to search
no edit summary
===Step One: Creating UUID Matches===
We began by making sure our company names were unique; creating a 1-1-1-1 relationship (only one instance of a company name in our data, and in Crunchbase data). We did so using the Matcher. We matched our sheet against itself, and Crunchbase info (pulled from organizations table detailed below) against itself, to remove duplicates and only leave unique values. [[More here : http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data#Collecting_Company_Information]]
Upon Ed's instruction, we then looked at companies ''in Crunchbase'' which had more than one UUID associated with the company name. Of the 670,000 companies in Crunchbase, only 15,000 had duplicate UUIDs. From this list of 15,000, we used recursive filtering to determine if any companies could be properly matched to the company in our data by looking at additional variables (such as company location).

Navigation menu