Changes

Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists (view source)

Revision as of 20:01, 18 July 2019

7,346 bytes added , 20:01, 18 July 2019

→‎Rebuilding Marcos

*Chenyu's code and datasets are in .\matlab

*.\linearmodel is the current STATA work

Chenyu's box (available until Oct 31st 2019) is here: https://rochester.app.box.com/s/nvtqgpmyygjykes3lcx9s53sfzmu8c27

It contains:

*working folder

*sample batch files

Both folders were cloned to E:\projects\unobservedcomplementarities\Chenyusbox on 17th July 2019.

Notes:

*data_import3.m uses MasterRealC20YearFullPlus.txt, which is the latest dataset

===Linear Model===

The objective is to add city ranking and serials, possibly as well as no. coinvestors, and VC experience x no. coinvestors, to a linear model.

The data for the linear model should include real and synthetic matches. However, to make it comparable to Chenyu's data, we ~~likely~~ need to exclude some markets.

Marcos' used Z:\VentureXpertDB\vcdb3\MasterRealOneSynth.txt as a base dataset, which contained only a single synth. However, in lpm_full.do, he loads MasterWithSynthcode20.txt instead. Note that some of Marcos' do files were not in the dropbox but were in E:\mcnair\Projects\MatchingEntrepsToVC\Stata\DoFiles

Data notes:

*Exit, exitvalue and related measures are going to be right censored

In data_import2.m, Chenyu has the following restrictions that I clone in STATA (counts are mine):

*He starts with MasterC20YearFull.txt, rather than MasterRealC20YearFullPlus.txt (which suggests he isn't using the latest data)

*Mkts are pccode20,dealminroundyear

*Removes unmatched VCs and startups (shouldn't be in latest dataset?)

*Requires that matched VCs have synthetics with all startups in the market (should be redundant now?)

*Requires there to tbe >=5 and <=10 real matches in a market

**This reduces the number of obs from 445,710 to just 59,205 (13.3%)

*Requires the year to be >=1990 and <=2001

**142,738 out of 445,710 (32%), or 18,055 out of 59,205

*Removes duplicates (should be redundant with revised data?)

*Removes markets with marketid NaN (not clear why this happens)

*In master_dyad.m, Chenyu has year bounds of 2002 and 2016. This upper bound likely has right censoring on exits.

The STATA do file is in:

E:\projects\unobservedcomplementarities\linearmodel

====Rebuilding Marcos====

Marcos starts with a dataset of reals with a single synthetic, and then constructs a dataset of reals with all synthetics (in the same year and code20).

Table 1 gives some LPMs using two sets of variables with and without VC-yearmet fixed effects. These are replicated in the new do file. In order to get something close to Marcos's reported numbers, I create a one-to-one variable so that each real match has only a single synthetic match. This gives about 60k observations as compared with Marcos's 64K (and as opposed to 445k for the full sample). The coefficients are very close to those in Table 1. There are some caveats, however. Marcos is using:

*Amounts in billions (as am I) without taking logs (of 1+x)

*Firmid x year (which he refers to as VC x yearmet) fixed effects, as opposed to year (i.e., dealminroundyear) x pccode20 fixed effects, which correctly define a market

*No restrictions on timing

Table 5 gives some LPMs before and after a Lasso. The hqdist variable was first transformed so that hqdist = hqdist/1000. Note that the matchhqdist variable is bimodal. Matchbodist is also bimodal but not as strongly. The second spike in the distribution is just over 4000km, which is the arc distance from San Francisco to Boston (4335km [https://www.distance.to/Boston/San-Francisco])

Again the data is just a single synthetic for each real. In this analysis, Marcos also clusters the standard errors at the year level, but does not use any fixed effects.

The labels in the pdf are somewhat misleading. The margin command reports only the underlying covariates not the interactions (unless you specifically generate the variables). An analysis of just the underlying variables without the interactions would have produced markedly different margins! The margins in table 6 column 1 of the pdf are coming from the following:

PDF -> source

--------------------------------------

hdqist -> c.hqdist##c.hqdist

sumprevsameindu20 -> c.sumprevsameindu20##c.sumprevsameindu20

serials -> c.serials##c.numprevportco

numprevportcos -> c.patentsprevc##c.numprevportco

firmtenure -> c.serials##c.firmtenure

patentsprevc -> c.patentsprevc##c.firmtenure

Note that STATA uses ## to report both main effects for each variable as well as an interaction, so c.hqdist##c.hqdist reports both hqdist and hqdist^2, while c.serials##c.numprevportco reports serials, numprevportco, and serials*numprevportco. Variables are omitted when duplicated as in c.serials##c.numprevportco and c.patentsprevc##c.numprevportco, which both report numprevportco.

We don't get the same lasso results as Markus:

Variable MarcosLasso NewLasso

-----------------------------

hdqist yes yes

sumprevsameindu20 yes yes

serials yes no

numprevportcos yes no

firmtenure yes yes

patentsprevc no no

But Marcos's spec isn't very grounded. He clusters standard errors at the year level but uses no fixed effects. We want to know what goes on inside markets, implying market-level fixed effects. He believes that "Since non-match specific variables are not used in the structural model, we have to interact VC or Startup specific variables." I'm not sure that this is correct. He goes on to say that "Therefore, the main specification is one which every match-specific variable has a quadratic interaction, and startup and VC variables are interacted with each other. Also, we exclude industry code from the model because it is a discrete variable, and

we transform VC founding year to VC tenure, which subtracts the former with year of match."

Industry certainly won't matter with market fixed effects. Marcos also used numprevportco as if it was purely a VC variable, rather than being closer to a match specific variable.

I tried Marcos's approach using all of the possible variables (old and new) but always and only using firmtenurel as a VC interaction variable (as firmportcosl is used to pick the real from the list of potential reals, and as firmapportione~ml is correlated with firmportcosl). I will also only use pccityoverallr~1l as the PortCo interaction variable, as that's the only PortCo variable that survives to significance.

The result was:

. margins, dydx(*) post

Average marginal effects Number of obs = 381,882

Model VCE : Robust

Expression : Linear prediction, predict()

dy/dx w.r.t. : pccityoverallrankm1l firmtenurel firmportcosl matchprevindu20l matchbodistl

matchinstagenarrow matchcity matchstate

--------------------------------------------------------------------------------------

| Delta-method

| dy/dx Std. Err. t P>|t| [95% Conf. Interval]

---------------------+----------------------------------------------------------------

pccityoverallrankm1l | .0055706 .0002035 27.38 0.000 .0051718 .0059694

firmtenurel | .0059353 .0005165 11.49 0.000 .0049229 .0069477

firmportcosl | .0052155 .0004399 11.86 0.000 .0043532 .0060777

matchprevindu20l | -.0536725 .0007413 -72.41 0.000 -.0551254 -.0522196

matchbodistl | -.0106516 .0003413 -31.21 0.000 -.0113205 -.0099826

matchinstagenarrow | .0057086 .0007494 7.62 0.000 .0042398 .0071774

matchcity | .0684326 .0041129 16.64 0.000 .0603715 .0764937

matchstate | .0436431 .0015343 28.45 0.000 .040636 .0466503

--------------------------------------------------------------------------------------

Finally, collapse the dataset by summing realmatch and produce a histogram and some analysis.

===Notes from Conference Call===

*Reduced form estimation: VC investment and outcomes? Logit? outcomes (exit measures). Real match explatory variable, match characteristics, controlling

*Target: May

===Running Chenyu's code on HPCC===

Two Wharton ugrads: Stacey and Kenneth (no account yet) are going to try running Chenyu's code on the HPCC. Chenyu is going to put everything into Box and invite us all to it.

==Reference Papers==

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,612

edits

Changes

Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists (view source)

Revision as of 20:01, 18 July 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools