Changes

431 bytes removed , 13:41, 21 September 2020

no edit summary

{{Project|Has project output=Data|Has sponsor=McNair ~~Projects~~Center

|Has title=Hubs

|Has owner=Hira Farooqi,

|Has keywords=Data

|Has project status=Active

|Does subsume=Hubs Analysis 2017,

}}

The Hubs Research Project is a full-length academic paper analyzing the effectiveness of "hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area.

~~This research will primarily focused on large~~ '''Important Notice: The last update to the hubs data was done manually by Ed and ~~mid~~is in E:\projects\MeasuringHGHTEcosystems\HubsData-~~sized Metropolitan Statistical Areas (MSAs), as that is where the greater majority of Venture Capital funding is located~~RevisedSimplified. xlsx'''

~~===Primary Data Set===~~

~~The Hubs data set, from SDC Platinum, has been constructed in the server:~~

~~Data files are in 128.42.44.181/bulk/Hubs~~

~~All files are in 128.42.44.182/bulk/Projects/Ecosystem/Hubs~~

~~psql Hubs~~

The Hubs Research Project is a full-length academic paper analyzing the effectiveness of "hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area. It focuses on cities in the United States as the primary unit of analysis.

~~The~~ This page contains information about data ~~set includes all United States Venture Capital transactions (moneytree) from the twenty-five year period of 1990 through 2015.Data has been aggregated at the portfolio company~~used for this research project, ~~fund~~including data sources, ~~and round level. It will be analyzed at the combined MSA level. We will be looking at in terms~~ location of ~~number of companies funded in number of funds active,~~ data on RDP and ~~flow of investment in a given MSA~~details on data processing.

~~The data set has now been uploaded to the database server, named Hubs.~~

~~There are 4 tables:~~

*Rounds: Rounddate, coname, state, roundno, stage1, etc.

*CombinedRounds: Coname, rounddate, discamount, fundname

*Companies: LastInv, FirstInv, coname, MSA, MSACode, Address, state, datefounded, totalknownfunding, industry(major)

*Funds: fundname, closingdate, lastinv, firstinv, msa, msacode, avinv, nocoinv, totalknowninv, address

~~Used variables:~~Information on initial data work done prior to Summer 2017 can be found at [[Hubs Summer 2016]].

~~Companies~~'''Note on joining: ~~Coname~~''' The city-state-year ID from VC data is used as the master ID for joining datasets. Each table (e.g. income, ~~MSACode~~nih, ~~Industry~~nsf, ~~state~~ ~~MSALookupTable: MSACode~~sbir, ~~MSASuper~~ ~~IndustryLookupTable: IndustryMajor, InduCode~~ compustat) is first joined with the VC data on city-> ~~CompanyInfo: Coname, MSASuper, InduCode,~~ state ~~(complete)~~-year ID and then the resulting tables are all joined together in the final table.

~~Funds: fundname, msacode, state~~

~~MSALookupTable: MSACode, MSASuper~~

->

~~FundInfo: fundname, msacode, state (complete)~~

~~Rounds~~===Data by zip code===*Population data, 2000-2016 - US Census Bureau (E:\McNair\Hubs\summer 2017)https: ~~coname~~//www2.census.gov/programs-surveys/popest/datasets/*Income data, ~~rounddate, stagecode, roundno~~1998-2014 - The Internal Revenue Service (E:\McNair\Hubs\summer 2017) ~~CombinedRounds~~https: ~~coname~~//www.irs.gov/uac/about-irs*DCI index, ~~rounddate, discamount, fundname~~to assess the economic well-being of communities http://eig.org/dci/interactive-maps/u-s->zip-codes ~~RoundInfoSuper~~*R&D Expenses, 1980-2016 - Wharton Research Data Services (E: ~~coname, rounddate, '''nofunds''', discamount~~ \McNair\Hubs\summer 2017) *Zipcode look-> ~~RoundInfo~~up table obtained from https: ~~Coname, roundyear, fundname, estamount~~ //www.unitedstateszipcodes.org/zip-code-database/. It's available in (~~complete~~E:\McNair\Hubs\summer 2017).

~~Then take:~~ ~~RoundInfo: Coname, roundyear, fundname, estamount~~ ~~CompanyInfo: Coname, MSASuper, InduCode, state~~ ~~FundInfo: fundname, msacode, state~~ -> ~~SuperRoundInfo: Coname, CoMSASuper, CoInduCode, CoState, FundName, FundMSASuper, FundState, RoundYear, RoundEstAmount~~ -> ~~MSAPortCos: Count(CoName) As NoPortCosFunded, CoMSASuper, RoundYear~~ ~~...~~== Data by MSA ==

~~'''Notes on Creation~~ We have principle cities of ~~Primary Data Set'''~~MSAs from the census:https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html

~~Raw tables~~* companies (last investment, first investment, company name, MSA, MSA We might be able to go City -> FIPS place code~~, address, state, date founded, known funding, industry)~~ * funds (fund closing date, last investment, first investment, fund name, address, -> MSA~~, MSA code, Average investment, number companies invested (NoCos), known investment)~~ * rounds (round date, company name, state, round number, stage 1, stage 2, stage 3) * combined rounds (company name, round date, disclosed amount, investor) * msalist (changes MSAs to CMSAs— combined MSAs)*industry list (changes 6 industry categories to 4— ICT, Life Sciences, Semiconductors, Other) ?

~~Process~~*cleaned tables to eliminate duplications, undisclosed variables*changed all original characters to include CMSA Cities and ~~Industry Codes~~ their FIPS codes (~~companyinfo3, fundinfocleanfinal, roundinfoclean~~which don't perfectly correspond) *matched funds to avoid any issues with names (iare available from https://www.~~e. Fund ABC L.P~~census.gov/geo/reference/~~Fund ABC LP~~codes/~~Fund ABC)~~ *matched roundinfoclean investors to fundinfocleanfinal investors (roundinfo.txt >> cleanfundfinalplace.~~txt)~~*join by round and company conames*bridge years (1990-2016), stage, and cmsa* populate data with count of companies (Deal flow) and estimated amount ($)** data set in 181 hubs folder under summarycmsa.txt (38394)html

~~Key decisions~~The Census claims to provide city to MSA here:*Threw out undisclosed co throughhttps://www.census.gov/geo/maps-~~out as no address~~data/data/ua_rel_download.html*Count However, there is ~~done by joining round and company~~*Anything fund related must be disclosed fund*Near and far, and total invested, and fund counts, etc., are all done using disclosed funds that match onlyCBSA!

This might do it: https://www2.census.gov/geo/pdfs/maps-data/data/rel/explanation_ua_cbsa_rel_10.pdf

We can maybe track city to principal city to MSA

~~'''Glossary of Tables'''~~==COMPUSTAT Data== ~~cleanco — used to remove duplicates from companies~~ ~~cleanedcompanies — clean set of companies with no duplicates~~ ~~cmsafunds-~~ ~~cmsas— list of all CMSAs in final~~ The data set ~~(for merging)~~ ~~cmsastats- statistics not including empty years (pre-merge)~~ ~~cmsastats2 - statistics separated by year-MSA~~ ~~cmsastats3— statistics separated by year-MSA-stage~~ ~~cmsastats4~~ ~~cmsayears— empty merged table between year and cmsa~~ ~~cmsayearstage — empty merged table between cmsa/years and stage~~ ~~combinedrounds— raw sdc data for combined rounds~~ ~~combinedroundswamt— used to join rounds and combined rounds for roundinfo2~~ ~~companies- raw SDC company data~~ ~~companyinfo — cleaned companies joined with state and CMSA~~ includes information ~~companyinfo2— companyinfo1 with original industry categories~~ ~~companyinfo3— companyinfo2 with updated industry categories and codes~~ ~~companyinfo4-- clean version of companyinfo3~~ ~~companyround- combined company information with round information~~ ~~companyround2- combined company information with round information, cleaned up~~ on publicly traded firms in the US. It was obtained from ~~companyround2~~ ~~companyround3- combined company information with round information, cleaned up from companyround3~~ ~~'''finaldataset'''- final statistics by CMSA-year, see section Final Primary~~ the Wharton Research Data ~~Set for more information~~ ~~fundinfo— funds joined with CMSA info~~ ~~fundinfo2 - clean version of fundinfo1~~ ~~fundinfoclean - used in process to clean fundinfo2~~ ~~fundinfoclean2- used in process to clean fundinfo2~~ ~~fundinfocleanfinal- used in process to clean fundinfo2~~ ~~fundinfocleannodups- final clean set of fundinfo~~ ~~funds - raw SDC fund data~~ ~~Houston - analysis for Houston ecosystem team~~ ~~Houston2- analysis for Houston ecosystem team~~ ~~houston3- analysis for Houston ecosystem team~~ ~~industry — new industry codes~~ Services (~~4)— used for all future data sets~~ ~~industrylist— lookup table for new industry codes (went from 6 to 4)~~ ~~joined1- used for matching process~~ ~~joined2- used for matching process~~ ~~matchfund2- used for matching process~~ ~~matchfunds- used for matching process~~ ~~matchroundfund - used for matching process~~ ~~matchroundfund2- used for matching process~~ ~~msalist — lookup table for MSA to CMSA (used for all future data sets)~~ ~~nearfar1-- beginning set before adding nearfar~~https://~~stage variables~~ ~~nearfar2 -~~wrds- ~~added binomial variables for near~~web.wharton.upenn.edu/~~far and for each of the stages, used to build final dataset~~ ~~roundfund— not used— joined round to fund; drop~~wrds/~~ignore~~ ~~roundinfo— round info cleaned up to include number of investors in a syndicate and estimated investment per member of syndicate~~ ~~roundinfo2— roundinfo1 including name of investors/funds~~ ~~roundinfo3— clean version of roundinfo2~~ ~~roundinfoclean — final clean version of roundinfo3 (final roundinfo table~~index.cfm?) ~~rounds — raw SDC round data~~ ~~stages — table for merging stage-year-CMSA~~ ~~superinfo — ignore/drop~~ ~~temp - used for matching process~~ ~~years — table for merging stage-year-CMSA~~.

~~===Hub Candidates Data Set===~~

~~The Hubs candidate data set~~ Raw Data is ~~a list of potential hubs found~~ in MSAs throughout the country. Researchers are currently pulling qualitative and quantitative information from the candidate's websites, in an attempt to categorize what can be identified as a hub. This is a difficult data set to pull, as there is little to no quantitative information available for this category of institution, and is dependent on accessibility of information to the public on the internet.: E:\McNair\Projects\Hubs\Summer 2017 Z:\Hubs\2017

~~Characteristics/Variables~~The source file is RandDExpenditures.txt. It contains:*~~Year Founded~~Date from 1980-2017 (July). *~~Square footage~~427799 records*~~LinkedIN self-identifiers (what the organization classifies itself on its LinkedIN profile)~~ Fields include:*~~Activeness on Twitter (binomial)~~*R&D Expenditure*~~Member Directory available online~~ *Address (~~binomial~~inc. city, zip, state)*~~Number~~ *Revenue of ~~conference rooms~~firms*Price ($/month) for Flex desk Database is '''cities''' SQL script is: COMPUSTAT.sql*Offers Reserved desk (binomial)*Offers office space for rent (binomial) Output file is COMPUSTATSummary.txt. It contains:*~~Offers community membership-- not for coworking but for community events~~Variables: City, year, ~~etc~~No. ~~(binomial)~~public firms, sum R&D, sum Sales, sum total assets*~~Number of events offered per month (estimate)~~1979-2016*~~Offers code academy~~4440 cities*Mission Statement/Vision (for qualitative or key-word analysis) It is located in Z:\Hubs\2017\Output_Files

~~These characteristics/variables will be used to determine whether a candidate~~ ==NSF Data==Data is or in: E:\McNair\Projects\Hubs\Summer 2017 Z:\Hubs\2017 Database is ~~not likely to be a Hub.~~ '''cities'''

~~As of March 10th 2016, the list contains 125 Hub candidates~~SQL script is: nsf_2017.sql

The source files are: nsf2017.txt, copied from table '''nsf''', and nsf_institution copied from table '''~~Where to find~~nsf_grants_institution'''~~: The Hubs data set can be found in~~ from the ~~Ecosystem>>Hubs>>dataset folder~~biotech db. ~~It is not currently in the database due to a UTF8 issue~~

~~===Supplementary Data Sets===~~They contain:*Award ID*Award Institution*Award Effective date*Institution city*Award Value*Organization state code~~'''Patent data''': to be pulled from USPTO or SDC Platinum.~~ From 1900 - 2017

~~'''Number of STEM Graduate Students''' (NSF) and '''University R&D Spending''' (NSF):~~*University R&D Data found under Output file ~~"NSF DATA_2004 to 2011~~is nsfSummary.~~xlsx" in datasets folder (Ecosystem>>Hubs>>Datasets)~~txt. It contains:*~~R&D spending found at the university level for 2014 ("Stem Grad Students.xlsx) or at state level ("Science and Engineering Grad Students by~~ Variables: City, State ~~and Year 2000-2011.csv")~~** not uploaded to server or matched yet to CMSA codeyear, nsf_nogrants, ~~because of this discrepancy.~~ nsf_valuegrant **"Stem Grad Students.xlsx" contains categorized university by MSA, can be used for all university1900-~~based projects~~2017

===Joined NSF table===The joined nsf table with the VC table is found in db '''~~Per Capita Income~~cities''' ~~and~~ . The table is named '''~~Employment Data~~merged_nsf''' ~~(US Census Bureau):~~ *"Per Capita Personal Income by MSA 2000-2012.~~xlsx" in datasets folder (Ecosystem>>Hubs>>Datasets>>Data from Yael)~~*"Wages All the values of nogrants and ~~Salaries by MSA 2000~~valuegrant with missing values for years 1990-~~2012~~2017 are set equal to 0.~~xlsx"~~ The sql script is in ~~datasets folder (Ecosystem>>Hubs>>datasets>>Data from Yael)~~**not uploaded to server or matched yet to CMSA code Z:\HUbs\2017\sql scripts

~~'''Firm Births''' (BDS)~~==NIH Data==*Data is in ~~server 181, under table name "BDS"~~: *includes birth, death, net(birth-death) and rate(death rate) for years 1990-2013 for every msa Z:\Hubs*includes code for CMSA but is not aggregated by CMSA** i.e. BDS statistics are still separate for all the smaller MSAs in New York's CMSA (code=1) E:\McNair\Projects\Hubs\Summer 2017

~~===Resources===~~Database is '''cities'''* Yael Hochberg and Fehder (2015), located in dropboxSQL script is: nih2017.sql** Use this paper as a guideline on how to conduct the analysisThe source files are: *~~US Census Bureau data on employment by MSA: http://factfinder~~nih_1986_2001.~~census.gov/faces/tableservices/jsf/pages/productview.xhtml?pid=ACS_14_5YR_B23027&prodType=table~~csv*~~USPTO utility patents by MSA: http://www.uspto.gov/web/offices/ac/ido/oeip/taf/cls_cbsa/allcbsa_gd~~nih_2002_2012.~~htm~~txt*~~MSA level trends~~nih_2013_2015located in E: ~~http://www.metrotrends.org/data.cf~~\McNair\Projects\Federal Grant Data\NIH

~~===The Target Dataset===~~

~~We will need to process the following variables:~~

*SuperMSA - combine SanFran and SanJose, New York and Newark?, NC Research triangle, others?

*CSV mapping msas to cmsas is in the folder (and a table in the dbase)

The script that cleans NIH data and generates the summary table is titled '''nihSummary'''. It is located here:

~~Example dataset~~ Z: ~~MSA Year SeedVCInv SeedEarlyVCInv LaterVCInv NoDeals FundsInvested DistinctInvestors ....~~ ~~----------------------------------------------------------------------------------------------------------------------------~~ ~~1234 2001 1000000 20000000 30000000 4 7 7~~\Hubs\2017\sql scripts

This table includes

*year

*city

*state

*country

*nogrants (number of grants)

*valuegrant

*city_state

~~Note that the unit of observation is MSA~~*Date from 1986-~~Year.~~2015

~~Variables to be computed at~~ ===Joined NIH table===The joined NIH table with the ~~MSA level:~~*HubActive (binary)*NoHubsActive (Count)*HubSqFt*Other Hub Vars (build list!!!)*VC table is found in db '''~~SeedVCInv~~cities''' ~~(Seed/Start-up)~~*. The table is named '''~~EarlyVCInv~~merged_nih''' ~~(Early Stage)~~.*'''LaterStageVC''' (Later)*'''OtherStageVC''' (Buyout/Acq, Other)*'''NoDeals''' (done by local VCs?)**'''NoDealsNear'''**'''NoDealsFar'''*NoPortCosFundedAll the values of nih_valuegrant and nih_nogrants with missing values for years 1986-2015 are set equal to 0.*'''FundsInv''' (The sql script is in ~~an MSA)~~**'''FundsInvFromNear''' (within MSA?)**'''FundsInvFromFar''' (outside MSA?)*DistinctInvestors (?)**DistinctInvestorsNear (within MSA?)**DistinctInvestorsFar (outside MSA?)*PatentCount*NoSTEMGrads*FirmBirths (BDS data)*UniRandDSpend*PerCapitaIncome*Employment Z:\HUbs\2017\sql scripts

~~We need to~~==Clinical Trials Data==Data is in:*Check funds invested means dollars invested Z:\Hubs*Categorize near and far! Is it within MSA vs. not, within adjacent MSAs, etc.? E:\McNair\Projects\Hubs\Summer 2017

Database is '''cities'''

SQL script is: ctrials.sql

The source file is:

~~There may be a second dataset that has Hub-Industry-Year (where industry is semiconductor/non-semiconductor?)~~*medclinical.txt

~~===Final Primary Data Set===~~located in Z:\Hubs\2017

*~~Deal is a round syndicate (near/far deal is one investor that is near/far).~~Date from 1999-2017

~~Table name: finaldataset~~ ~~cmsa~~ ~~year~~ ~~totalamountinv--total amount invested~~ ~~nearamountinv--amount invested from local funds~~ ~~faramountinv-- amount invested from funds outside CMSA~~ ~~earlyinv--amount invested in early stage companies~~ ~~laterinv--amount invested in later stage companies~~ ~~startupseedinv--amount invested in seed or startup stage companies~~ ~~otherstageinv--amount invested in Acquisition/Buy-outs/Other stage companies~~ ~~investingfund--distinct funds that are investing in that CMSA-year~~ ~~investingfundnear--distinct funds from that CMSA that invested in that CMSA-year~~ ~~investingfundfar--distinct funds from outside that CMSA that invested in that CMSA-year~~ ~~deals--number of deals~~ ~~neardeals--number of deals inside a CMSA~~ ~~fardeals--number of deals from outside a CMSA --some of these deals might count in both categories, because of syndicate members being both inside and outside the CMSA~~ ~~earlystagedeals--deals with earlystage companies~~ ~~laterstagedeals--deals with later stage companies~~ ~~startupseeddeals--deals with startup/seed companies~~ ~~otherstagedeals--deals with companies in other stages~~ ~~newportcosfunded--number of portfolio companies to receive their first investment in that year~~===Joined clinical trials table===

~~===Data by zip code===~~The file which contains the number of trials in each city and year is located in:*Population data, 2000-2016 - US Census Bureau (E Z:~~\McNair~~\Hubs\~~summer~~ 2017)~~https~~ The file is in:~~//www2.census.gov/programs-surveys/popest/datasets/~~*Income data, 1998-2014 - The Internal Revenue Service (E Z:~~\McNair~~\Hubs\~~summer~~ 2017)\clean data~~https~~The name of the file is:~~//www~~ ctrialsSummary.~~irs.gov/uac/about-irs~~txt It contains:*~~DCI index, to assess the economic well-being of communities~~city~~http://eig.org/dci/interactive-maps/u-s-zip-codes~~*year*~~R&D Expenses, 1980-2016 - Wharton Research Data Services (E:\McNair\Hubs\summer 2017)~~city_state_year*~~Zipcode look~~noctrials -up number of trials The ctrials is joined with VC table ~~obtained from https~~. The joined SQL script is:~~//www~~'''new_ctrials.~~unitedstateszipcodes.org/zip-code-database/. It~~sql'''~~s available~~ and it is located in (E Z:~~\McNair~~\Hubs\~~summer~~ 2017)\sql scripts The name of the joined table is '''new_merged_ctrials'''.

~~== Data by MSA ==~~It contains:*city*state*city_state_id*city_state_year*year*noctrials*seedamtm*earlyamtm*lateramtm*selamtm*numseeds*numearly*numlater*numsel

~~==COMPUSTAT Data==~~All the values of noctrials with missing values for years 1999-2017 are set equal to 0.

==Population Data==Data is in: Z:\Hubs

E:\McNair\Projects\Hubs\Summer 2017

Database is '''cities'''

SQL script is: '''population.sql'''

The source files are:

*pop2000_2009.xlsx

*pop2010_2016.xlsx

They contain:

*State

*City name

*Year

*Population Estimates

Date from 2000-2016

===Joined population table===

Data is in:

Z:\Hubs\2017\clean data

The file names are

1_population.txt - contains data on population estimates from 2000-2009

2_population.txt - contains data on population estimates from 2010-2016

Database is '''cities'''

SQL script is: '''new_population.sql''', located in

Z:\Hubs\2017\sql scripts

The population table is joined on VC table. The table is called '''new_merged_population'''.

They contain:

*City

*State

*city_state_id to uniquely identify each city

*city_state_year to uniquely identify each city in each year

*Population estimates

*Year

*Code from the state code and Fips code

*State full name

==Income Data==

Raw data was obtained from Census data, American Communities Survey.

Raw Data is in:

E:\McNair\Projects\Hubs\Summer 2017\MSA Income_raw.zip

Date from 2005-2015

The master list with MSAs and principal cities is titled '''list2.xls'''. It is located at:

Z:\Hubs\2017

This master list includes:

*MSA code

*MSA name

*Principal City

*State

*Place code (city code)

*State Code

This master list was edited to associate each principal city with a unique state. E.g. if New York is the principal city located in New York-New Jersey MSA, it was associated with state NY-NJ. So '''list''' was edited to put New York with NY.

Cleaned Income data files are in

Z:\Hubs\2017\merging_on_ID

They contain:

*MSA code

*MSA

*Year

*Total Household Income

The MSA-City-State look up file is titled '''msa_city_state_wcode.txt'''. It is located in

Z:\Hubs\2017\merging_on_ID

The SQL file that merges income data from ACS (by MSA - Year) with the MSA-City file is titled '''income.sql'''. It is located here:

Z:\Hubs\2017\sql scripts

The final income table is in db '''cities''' titled '''merged_income'''.

It includes:

*MSA

*City

*State

*Year

*Total Household Income

The table includes 8780 observations

===Joined income table===

Data is in:

Z:\Hubs\clean data

The file names are:

INC_05.txt - INC_15.txt

Database is '''cities'''

SQL script is: merged_income.sql

They contain:

*City

*State

*city_state_id to uniquely identify each city

*Income

*Year

*Code from the state code and Fips code

==Employment Data==

Data on employment was obtained from American Communities Survey, US Census Bureau.

Raw Data is in:

E:\McNair\Projects\Hubs\Summer 2017\Employment Data by MSA

Cleaned files are in

Z:\Hubs\2017\clean data

They contain:

*MSA code

*MSA

*Year

*Employment rate of individuals 16 years or older

*Unemployment rate of individuals 16 years or older

Date from 2005-2015

The SQL file that merges employment data from ACS (by MSA - Year) with the MSA-City file is titled '''Employment.sql'''.

The file is located in:

Z:\Hubs\2017

The final table is in db '''cities''' titled '''merged_employment'''.

It includes:

*MSA

*City

*Year

*Employment rate

*Unemployment rate

===Joined employment table===

Data is in:

Z:\Hubs\clean data

The file names are:

EMP_05.txt - EMP_15.txt

Database is '''cities'''

SQL script is: '''new_employment.sql''' and it is located in

Z:\Hubs\2017\sql scripts

The final table which is joined on VC is in db cities titled '''new_merged_employment'''.

They contain:

*City

*State

*Code from the state code and Fips code

*city_state_id to uniquely identify each city

*city_state_year to uniquely identify each city in each year

*Employment rates of individuals of 16 years or older

*Unemployment rates of individuals of 16 years or older

*Year

==Schooling Data==

Data on schooling was obtained from American Communities Survey, US Census Bureau.

Raw Data is in:

E:\McNair\Projects\Hubs\Summer 2017\School Enrollment Data by MSA

Cleaned files are in

Z:\Hubs\2017\clean data

They contain:

*MSA code

*MSA

*Year

*Total number of population 3 years and over enrolled in school

*Percent of population 3 years and over enrolled in public school

*Percent of population 3 years and over enrolled in private school

Date from 2005-2015

The SQL file that merges schooling data from ACS (by MSA - Year) with the MSA-City file is titled '''schooling.sql'''.

The file is located in:

Z:\Hubs\2017

The final table is in db '''cities''' titled '''merged_schooling'''.

It includes:

*MSA

*City

*Year

*Total

*Percent_public_schooling

*Percent_private_schooling

===Joined schooling table===

Data is in:

Z:\Hubs\clean data

The file names are:

SCH_05.txt - SCH_15.txt

Database is '''cities'''

SQL script which joins this table with VC table is: '''new_merged_schooling.sql'''

The final table is in db '''cities''' titled '''new_merged_schooling'''.

It contains:

*City

*State

*city_state_id to uniquely identify each city

*city_state_year to uniquely identify each city in each year

*Total number of school enrollment

*Percentage enrolled in public schools

*Percentage enrolled in private schools

*Year

*Code from the state code and Fips code

==VC Data==

Raw Data is in:

Z:\VentureCapitalData\SDCVCData\vcdb2

The file name is roundleveloutput2.txt

It contains:

*city

*state

*year

*seedamtm - seed, amount in millions

*earlyamtm - early, amount in millions

*lateramtm - late, amount in millions

*selamtm - seed early late, amount in millions

*numseeds - number of seeds

*numearly

*numlater

*numsel

*numdeals

*numalive

Date from 1948-2017

The table is in db '''cities''' titled '''new_vc'''.

It includes:

*city

*state

*city_state_id

*city_state_year

*seedamtm

*earlyamtm

*lateramtm

*selamtm

*numseeds

*numearly

*numlater

*numsel

*numdeals

*numalive

*year

==Final Joined Data set ==

The final data set is in file '''final.txt''' and is located here:

Z:\Hubs\2017

It includes:

*city

*state

*city_state_year - (ID that data is merged on)

*year

*seedamtm - Seed Amount

*earlyamtm - Early Investment Amount

*lateramtm - Late Investment Amount

*selamtm - Seed early or late amount

*numseeds - Number of seed investments

*numearly - Number of early investments

*numlater - Number of late investments

*numsel

*numdeals - Number of deals (first contracts)

*numalive - Number of start ups alive

*income - Income per capita in each city-year

*sbir_nogrants - Number of SBIR grants

*sbir_valuegrant - Value of SBIR grants

*emp - Employment stats of each city-year

*unemp - Rate of unemployment

*popestimate - Population estimate of each city-year

*private - Enrollment in private schools

*public - Enrollment in public schools

*total -

*numfirms - Number of publicly traded firms

*randd - R&D expenditure of publicly traded firms

*revenue - Revenue of PTF

*totalassets

*nsf_nogrants - Number of NSF grants

*valuegrant - Value of NSF grants

*nih_nogrants - Number of NIH grants

*nih_valuegrant - Value of NIH grants

*noctrials - NUmber of clinical trials

== Defining Hubs ==

'''Summer 2016''' - Last year a master list of 125 "potential" hubs was used. A scorecard was developed which filtered these 125 candidate hubs to determine which of these should be included in the study sample. This method resulted in a sample size of ~ 30. The master list and the final hubs list is titled '''Hubs Data v2_'16'''. It is located here:

Z:\Hubs\2017\hubs_data

'''Summer 2017''' - In order to obtain a more statistically significant sample of hubs, we developed 5 criteria which produce a more relaxed definition of hubs than last year. These include

*Availability of co-working space

*Coding classes or tech events

*Some focus on the tech sector (this is important as our dependent variable is VC funding)

*Presence of an accelerator

*Availability of mentorship for members.

We will review the 125 candidate hubs and select those which satisfy a subset or all of these characteristics.

~~SQL script is: COMPUSTAT.sql~~

~~The source file is RandDExpenditures.txt. It contains:~~

*Date from 1980-2017 (July). All COMPUSTAT.

*427799 records

*Fields include:

**R&D Expenditure

**Address (inc. city, zip, state)

~~Output file is COMPUSTATSummary.txt. It contains~~[[category:*Variables: City, year, No.public firms, sum R&D, sum Sales, sum total assets*1979-2016*4440 citiesInternal]]

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,613

edits

Changes

Hubs (view source)

Revision as of 13:41, 21 September 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools