Changes

Jump to navigation Jump to search
558 bytes added ,  17:57, 4 October 2019
The CIA data is then combined with [[US Incubators]] data, which is separately available in '''USIncubators.txt''', and everything is matched using name based matching to try to remove duplicates (within states) and produce the best information. The result can then be matched back to Crunchbase. There were 2155 distinct orgnames, 37 of which had internal name matches.
perl Matcher.pl -mode=2 -file1="DistinctIncubatorOrgNames.txt" -file2="DistinctIncubatorOrgNames.txt"
 
The result is the table '''Incubators''' and text file '''Incubators.txt''' with 2137 records and the following coverage:
*orgnamestd --2137
*orgname --2137
*statecode --2137
*url --2031
*description --1447
*city --1955
*address --970
*zip --624
*source --2137
 
Note that it was surprising that there wasn't greater overlap within the Crunchbase-INBIA-AngelList (CIA) data, or between the CIA data and US Incubators. This suggests that either each source is capturing different types of incubators, or that we are unlikely to have near-population coverage.

Navigation menu