Changes

VC Startup Matching Stata Work (view source)

Revision as of 13:48, 11 July 2018

3,407 bytes added , 13:48, 11 July 2018

no edit summary

E:\McNair\Projects\MatchingEntrepsToVC\Stata

contains all the necessary files to run the analysis. All the raw datasets are in the directory itself, while Do-Files, log-files and raw output like Stata-to-tex ~~files~~ tables are in ~~the~~ their respective folders. Written reports in .tex are in the Tex folder.

Regarding Do-Files organization, the first file to be opened has to be 'master.do'. In it, I wrote the necessary globals to make referencing directories easier, while also pointing out any necessary extra packages. In the future, when the analysis is more robust and clear, the general instructions of what each do-file does will be also written in the master do-file.

For now, every do-file is more or less self-descriptive and self-contained.

==Preliminary Analysis== A written report with detailed description of results can be found at E:\McNair\Projects\MatchingEntrepsToVC\Stata\Tex ===Initial Look at Dataset=== Before attempting to do any statistical analysis, I performed an initial look at the raw dataset to spot possible problems. There was a mistake in the synthetic VC's count of startups from the same sector and the current match, ie, variables 'synsumprevsamesector', 'synsumprevsameindu', 'synsumprevsameindu20', 'synsumprevsameindu10', as their values contained lots of -1 and 0s. To correct it, I changed the SQL code. More specifically, when creating table 'FirmnameInduBlowout', when doing the JOIN, the weak inequality was changed to strict inequality. Then, when creating the next table, 'FirmnameRoundInduHist', I removed the subtraction. The same was done to the corresponding synthetic tables. ===SummaryStatistics=== Summary statistics were produced using the 'summarystats.do' do-file. ===Linear Probability Model=== A linear probability model was suggested by Jeremy Fox, where Y=1 when the match is real, and Y=0 when the match is synthetic, and independent variables are characteristics from the VCs. To perform this regression, it is necessary to build a new dataset. This is done on 'lpmsynthetic.do'. At first look, this looks like a simple case of using the -reshape- function in Stata, since the original dataset is on a 'Wide' format, ie, the synthetic VC and its characteristics for each observation (startup) are variables (columns) itself, and we want to make them into observations (rows), with a dummy indicating when it is a real or synthetic match. However, the -reshape- command does not work with string variable names. Therefore the do-file performs a manual reshape. After sending the results to Jeremy Fox, he felt that the results were not as expected and suggested some corrections. ===Regressions=== We want to know if VCs are more likely to match with geographically close startups, if patents are good signals for VCs, if VCs preferserial founders and startups with similar demographic characteristics. We also want to know if startups prefer to match with VCs that have previous experience on startups of the same sector and VCs that prefer to invest in startups at their stage. Since we don't have 'out-of-match' VCs and startups, I decided to do two different types of regressions. I regress VCs all-time characteristics on their matched startups characteristics of interest, like distance, patents before match, demographic, etc. I am basically trying to see correlations. If 'good' VCs tend to match with very close startups, that had many patents before match, etc, then we can say there is some evidence of positive assortative matching. On the other hand, if 'good' startups matched with VCs that were within their scope of investment, that had a history of investing in similar sectors, then these characteriscs are important for the startups. Every regression has sector and VC founding year fixed effects. Also, for all count variables, I've log-transformed it (adding 1 before to account for zeros) as suggested by Ed Egan. For the distance variable, I've also log-transformed it. Continuous variables are not log-transformed because most of them contains zeros, and adding 1 doesn't seem to make much sense.

Marcoslee

44

edits

Changes

VC Startup Matching Stata Work (view source)

Revision as of 13:48, 11 July 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools