Changes

Jump to navigation Jump to search
4,243 bytes added ,  13:41, 21 September 2020
no edit summary
{{Project|Has project output=Data|Has sponsor=McNair ProjectsCenter
|Has title=Hubs
|Has owner=Hira Farooqi,
|Has keywords=Data
|Has project status=Active
|Does subsume=Hubs Analysis 2017,
}}
The Hubs Research Project is a full-length academic paper analyzing the effectiveness of "hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area.
This research will primarily focused on large '''Important Notice: The last update to the hubs data was done manually by Ed and midis in E:\projects\MeasuringHGHTEcosystems\HubsData-sized Metropolitan Statistical Areas (MSAs), as that is where the greater majority of Venture Capital funding is locatedRevisedSimplified. xlsx'''
Details The Hubs Research Project is a full-length academic paper analyzing the effectiveness of initial data work done prior to Summer 2017 can be found at [[Hubs Summer 2016]]"hubs", a component of the entrepreneurship ecosystem, in the advancement and growth of entrepreneurial success in a metropolitan area. It focuses on cities in the United States as the primary unit of analysis.
This page contains information about data used for this research project, including data sources, location of data on RDP and details on data processing.
 
 
 
Information on initial data work done prior to Summer 2017 can be found at [[Hubs Summer 2016]].
 
'''Note on joining:''' The city-state-year ID from VC data is used as the master ID for joining datasets. Each table (e.g. income, nih, nsf, sbir, compustat) is first joined with the VC data on city-state-year ID and then the resulting tables are all joined together in the final table.
 
 
===Data by zip code===
*Population data, 2000-2016 - US Census Bureau (E:\McNair\Hubs\summer 2017)
https://www2.census.gov/programs-surveys/popest/datasets/
*Income data, 1998-2014 - The Internal Revenue Service (E:\McNair\Hubs\summer 2017)
https://www.irs.gov/uac/about-irs
*DCI index, to assess the economic well-being of communities
http://eig.org/dci/interactive-maps/u-s-zip-codes
*R&D Expenses, 1980-2016 - Wharton Research Data Services (E:\McNair\Hubs\summer 2017)
*Zipcode look-up table obtained from https://www.unitedstateszipcodes.org/zip-code-database/. It's available in (E:\McNair\Hubs\summer 2017).
 
== Data by MSA ==
 
We have principle cities of MSAs from the census:
https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html
 
We might be able to go City -> FIPS place code -> MSA?
 
Cities and their FIPS codes (which don't perfectly correspond) are available from https://www.census.gov/geo/reference/codes/place.html
 
The Census claims to provide city to MSA here: https://www.census.gov/geo/maps-data/data/ua_rel_download.html
However, there is only CBSA!
 
This might do it: https://www2.census.gov/geo/pdfs/maps-data/data/rel/explanation_ua_cbsa_rel_10.pdf
We can maybe track city to principal city to MSA
==COMPUSTAT Data==
The data set includes information on publicly traded firms in the US. It was obtained from the Wharton Research Data Services (https://wrds-web.wharton.upenn.edu/wrds/index.cfm?).
 Raw Data is in:
E:\McNair\Projects\Hubs\Summer 2017
Z:\Hubs\2017
Database is '''cities'''
 
SQL script is: COMPUSTAT.sql
The source file is RandDExpenditures.txt. It contains:
*Date from 1980-2017 (July). All COMPUSTAT.
*427799 records
*Fields include:
**R&D Expenditure
**Address (inc. city, zip, state)
**Revenue of firms
Database is '''cities'''
 
SQL script is: COMPUSTAT.sql
Output file is COMPUSTATSummary.txt. It contains:
*1979-2016
*4440 cities
 
It is located in
Z:\Hubs\2017\Output_Files
==NSF Data==
The script that cleans NIH data and generates the summary table is titled '''nihSummary'''. It is located here:
EZ:\McNair\Projects\Hubs\Summer 2017\sql scripts
This table includes
*nogrants (number of grants)
*valuegrant
*city_state (the city-state ID that we'll merge on)
*Date from 1986-2015
Raw Data is in:
Z:\VentureCapitalData\SDCVCData\vcdb2 The file name is roundcitystateyearroundleveloutput2.txt
It contains:
*numlater
*numsel
*numdeals
*numalive
Date from 19531948-2017
The table is in db '''cities''' titled '''vcnew_vc'''.
It includes:
*numlater
*numsel
*numdeals
*numalive
*year
 
==Final Joined Data set ==
 
The final data set is in file '''final.txt''' and is located here:
Z:\Hubs\2017
 
It includes:
*city
*state
*city_state_year - (ID that data is merged on)
*year
*seedamtm - Seed Amount
*earlyamtm - Early Investment Amount
*lateramtm - Late Investment Amount
*selamtm - Seed early or late amount
*numseeds - Number of seed investments
*numearly - Number of early investments
*numlater - Number of late investments
*numsel
*numdeals - Number of deals (first contracts)
*numalive - Number of start ups alive
*income - Income per capita in each city-year
*sbir_nogrants - Number of SBIR grants
*sbir_valuegrant - Value of SBIR grants
*emp - Employment stats of each city-year
*unemp - Rate of unemployment
*popestimate - Population estimate of each city-year
*private - Enrollment in private schools
*public - Enrollment in public schools
*total -
*numfirms - Number of publicly traded firms
*randd - R&D expenditure of publicly traded firms
*revenue - Revenue of PTF
*totalassets
*nsf_nogrants - Number of NSF grants
*valuegrant - Value of NSF grants
*nih_nogrants - Number of NIH grants
*nih_valuegrant - Value of NIH grants
*noctrials - NUmber of clinical trials
 
== Defining Hubs ==
'''Summer 2016''' - Last year a master list of 125 "potential" hubs was used. A scorecard was developed which filtered these 125 candidate hubs to determine which of these should be included in the study sample. This method resulted in a sample size of ~ 30. The master list and the final hubs list is titled '''Hubs Data v2_'16'''. It is located here:
Z:\Hubs\2017\hubs_data
 
'''Summer 2017''' - In order to obtain a more statistically significant sample of hubs, we developed 5 criteria which produce a more relaxed definition of hubs than last year. These include
 
*Availability of co-working space
*Coding classes or tech events
*Some focus on the tech sector (this is important as our dependent variable is VC funding)
*Presence of an accelerator
*Availability of mentorship for members.
 
We will review the 125 candidate hubs and select those which satisfy a subset or all of these characteristics.
 
 
 
[[category:Internal]]

Navigation menu