<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://www.edegan.com/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Old_Completed_Work_on_Hubs</id>
	<title>Old Completed Work on Hubs - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://www.edegan.com/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Old_Completed_Work_on_Hubs"/>
	<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Old_Completed_Work_on_Hubs&amp;action=history"/>
	<updated>2026-05-12T23:45:52Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.34.2</generator>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Old_Completed_Work_on_Hubs&amp;diff=18734&amp;oldid=prev</id>
		<title>HiraF at 16:15, 20 June 2017</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Old_Completed_Work_on_Hubs&amp;diff=18734&amp;oldid=prev"/>
		<updated>2017-06-20T16:15:28Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 16:15, 20 June 2017&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l110&quot; &gt;Line 110:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 110:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;    --Contains: Fips	MSA	Year	Wage&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;    --Contains: Fips	MSA	Year	Wage&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;    --Lookup to CMSA was done using VLOOKUPs in Excel. See Matcher Helper vTR.xls, and other Matcher Helper ???.xls files&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt; &lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;    --Lookup to CMSA was done using VLOOKUPs in Excel. See Matcher Helper vTR.xls, and other Matcher Helper ???.xls files&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt; &lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[category:Internal]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>HiraF</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Old_Completed_Work_on_Hubs&amp;diff=18733&amp;oldid=prev</id>
		<title>HiraF: Created page with &quot;This page is referenced in Hubs (Academic Paper)  ==Venture Capital Data General Overview== The main goal of the data set is to aggregate company, fund, and round level da...&quot;</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Old_Completed_Work_on_Hubs&amp;diff=18733&amp;oldid=prev"/>
		<updated>2017-06-20T16:14:48Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;This page is referenced in &lt;a href=&quot;/wiki/Hubs_(Academic_Paper)&quot; class=&quot;mw-redirect&quot; title=&quot;Hubs (Academic Paper)&quot;&gt;Hubs (Academic Paper)&lt;/a&gt;  ==Venture Capital Data General Overview== The main goal of the data set is to aggregate company, fund, and round level da...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;This page is referenced in [[Hubs (Academic Paper)]]&lt;br /&gt;
&lt;br /&gt;
==Venture Capital Data General Overview==&lt;br /&gt;
The main goal of the data set is to aggregate company, fund, and round level data to be analyzed at a combined MSA and year level. The data set is compromised of two major parts: a granular company/fund/round and an aggregated CMSA-Year.  The data includes all United States Venture Capital transactions (moneytree) from the twenty-five year period of 1990 through 2015.&lt;br /&gt;
&lt;br /&gt;
The Hubs data set, from SDC Platinum, has been constructed in the server:&lt;br /&gt;
 Data files are in 128.42.44.181/bulk/Hubs&lt;br /&gt;
 All files are in 128.42.44.182/bulk/Projects/Hubs&lt;br /&gt;
 psql Hubs2&lt;br /&gt;
&lt;br /&gt;
Sql files:&lt;br /&gt;
:&amp;lt;code&amp;gt;E:\McNair\Projects\Hubs\Data Script v10.txt&amp;lt;/code&amp;gt;&lt;br /&gt;
Note: We need to check that everything in '''Data Script v9 Ariel.txt''' has been incorporated into v10&lt;br /&gt;
&lt;br /&gt;
Table Header Rows + 5 lines:&lt;br /&gt;
:&amp;lt;code&amp;gt;E:\McNair\Projects\Hubs\Data Table List v2.txt&amp;lt;/code&amp;gt;&lt;br /&gt;
Note: This was generated by '''Data Script v10.txt'''&lt;br /&gt;
&lt;br /&gt;
===Procedure - Granular Table===&lt;br /&gt;
#Start with separate raw datasets for Companies, Funds, and Rounds - '''Locate Raw Datasets and Determine Pedigree'''&lt;br /&gt;
#Add Data to Each Individual dataset (e.g. add MSA code)&lt;br /&gt;
#Clean and standardize names (e.g. company or fund name) for each dataset&lt;br /&gt;
#Join the Datasets (here we need to exclude undisclosed companies)&lt;br /&gt;
&lt;br /&gt;
===Procedure - CMSA-Year Table===&lt;br /&gt;
#Create a consistent CMSA-Year table to be used later&lt;br /&gt;
#Using the tables from the granular table, parse out the right data&lt;br /&gt;
#Join the parsed out data with the CMSA-Year Table&lt;br /&gt;
#Join these Tables&lt;br /&gt;
&lt;br /&gt;
==VC Specific Tables and Procedure==&lt;br /&gt;
===Raw data tables===&lt;br /&gt;
#'''Funds''': fund name, first investment date, last investment date, fund closing date, address, known investment, average investment, number of companies invested, MSA, MSA code.&lt;br /&gt;
#'''Rounds''': round date, company name, state, round number, stage 1, stage 2, stage 3&lt;br /&gt;
#'''Combined Rounds''': company name, round date, disclosed amount, investor&lt;br /&gt;
#'''Companies''': company name, first investment, last investment, MSA, MSA code, address, state, date founded, known funding, industry&lt;br /&gt;
#'''MSA List''': MSA, MSA code, CMSA, CMSA code&lt;br /&gt;
#'''Industry List''': changes 6 industry categories to 4— ICT, Life Sciences, Semiconductors, Other&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Granular Table (Fund-Round-Company)===&lt;br /&gt;
The final table here contains all venture capital transactions by disclosed funds and portfolio companies, together with their CMSAs.&lt;br /&gt;
To get the table, we processed the raw data sets in the following steps:&lt;br /&gt;
#Clean '''Company''' data&lt;br /&gt;
##Import raw data companies&lt;br /&gt;
##Add variable 'CMSA' from data set MSA list, update variable 'industry' by joining data set industry list&lt;br /&gt;
##Remove duplicates and remove undisclosed companies &lt;br /&gt;
#Clean '''Fund''' data&lt;br /&gt;
##Import raw data funds&lt;br /&gt;
##Add variable 'CMSA'&lt;br /&gt;
##Remove duplicates and remove undisclosed funds&lt;br /&gt;
##Match fund names with itself using [[The Matcher (Tool) |The Matcher]] to get the standard fund names&lt;br /&gt;
#Clean '''Round''' data&lt;br /&gt;
##Import raw data rounds and combined rounds&lt;br /&gt;
##Add variables 'number of investment', 'estimated investment' and 'year'&lt;br /&gt;
##Remove duplicates and remove undisclosed funds&lt;br /&gt;
#'''Combine''' '''Companies''' and '''Rounds'''&lt;br /&gt;
##Combine cleaned companies and rounds data table on company names&lt;br /&gt;
##Add variable 'round number' and 'stage'&lt;br /&gt;
##Remove duplicates&lt;br /&gt;
#'''Combine''' '''Funds''' and '''rounds-companies'''&lt;br /&gt;
##Match fund names in rounds data table with standard fund names using [[The Matcher (Tool) |The Matcher]] to standardize fund names in rounds data table&lt;br /&gt;
##Join standard fund names to rounds-companies table&lt;br /&gt;
##Join cleaned funds table to rounds-companies table on standard fund names&lt;br /&gt;
&lt;br /&gt;
Note: This was done by Ariel and then edited by Todd.&lt;br /&gt;
&lt;br /&gt;
===CMSA-Year Aggregated Table===&lt;br /&gt;
&lt;br /&gt;
The original MSA to CMSA was done by Rachel and used here. '''LOCATE THE FILE!!!'''&lt;br /&gt;
&lt;br /&gt;
The final table contains number of companies and amount of investment, categorized by distance and stages, of each CMSA. &lt;br /&gt;
&lt;br /&gt;
We processed data as follows:&lt;br /&gt;
#Create the '''CMSA-Year''' Table&lt;br /&gt;
##Create single variable tables: Distinct CMSA, year, stage, found year of fund and found year of company.&lt;br /&gt;
##Create the cross production tables: CMSA-year, CMSA-year-fund year founded and CMSA-year-company year founded&lt;br /&gt;
#Draw data from cleaned companies, funds and rounds tables&lt;br /&gt;
##Create a table with 'CMSA', 'number of companies' and 'year Founded' from cleaned companies table and join it to CMSA -year founded&lt;br /&gt;
##Create a table with 'Company CMSA', 'round year', 'disclosed amount' from rounds-companies combined table, and add stage binary variables. Join it to CMSA-year-company year founded&lt;br /&gt;
##Create a table with 'CMSA', 'fund year', 'number of investors' from cleaned funds table and join it to CMSA-year-fund year founded&lt;br /&gt;
#Create '''near-far''' and stages table&lt;br /&gt;
##Add fund data to rounds-companies&lt;br /&gt;
##Create near-far and stages binary variable&lt;br /&gt;
##Count investment and deals by CMSA and year, categorized by near-far and stages&lt;br /&gt;
#Combine all tables by CMSA and round-year&lt;br /&gt;
&lt;br /&gt;
==Supplementary Data Sets==&lt;br /&gt;
&lt;br /&gt;
Supplementary data sets are cleaned and joined back to CMSAyear table on CMSA and year:&lt;br /&gt;
&lt;br /&gt;
#Number of STEM graduate student, by university and year(2005 to 2014). &lt;br /&gt;
#University R&amp;amp;D spending, by university and year(2004 to 2014).&lt;br /&gt;
#Income per capital, by MSA and year(2000 to 2012)&lt;br /&gt;
#Wages and salaries, by MSA and year(2000 to 2012)&lt;br /&gt;
&lt;br /&gt;
All of these files were created originally by Rachel. Some were cleaned in Excel. No new data was added (some extra cols, no extra rows).&lt;br /&gt;
&lt;br /&gt;
The datasets can respectively be found at:&lt;br /&gt;
 E:\McNair\Projects\Hubs\STEM grads for upload v2.xls&lt;br /&gt;
   --Contains: university	zipcode	newmsacode	msa	msacode	cmsa	cmsacode	year	nostudents&lt;br /&gt;
   --CMSA code inside sheet seems to be ours. Check with Ariel.&lt;br /&gt;
 E:\McNair\Projects\Hubs\NSF spending for upload.xls&lt;br /&gt;
   --Contains: Institution	MSA	CMSA code	Year	Spending&lt;br /&gt;
   --We think the CMSA Code is ours. Check with Ariel. &lt;br /&gt;
 E:\McNair\Projects\Hubs\Income per capita upload.xls&lt;br /&gt;
   --Contains: Fips	Area	Year	Income&lt;br /&gt;
   --Lookup to CMSA was done using VLOOKUPs in Excel. See Matcher Helper vTR.xls, and other Matcher Helper ???.xls files&lt;br /&gt;
 E:\McNair\Projects\Hubs\Wage for upload v2.xls&lt;br /&gt;
   --Contains: Fips	MSA	Year	Wage&lt;br /&gt;
   --Lookup to CMSA was done using VLOOKUPs in Excel. See Matcher Helper vTR.xls, and other Matcher Helper ???.xls files&lt;/div&gt;</summary>
		<author><name>HiraF</name></author>
		
	</entry>
</feed>