Changes

Jump to navigation Jump to search
no edit summary
*ACS variables
*Run and add in chosen layers!
*Decide how to do the instrument - first try with one hull?
===Instrumental Variable(s)===
hull area in t = f( hull area in t-1 intersecting TIF areas in t, TIF areas in t not intersecting hull area in t-1).
The two measures will have temporal variation as TIF areas vary over time and as hull areas vary over time...
==Analysis Exploration==
 
Choose LHS variable:
*Growth VC: Growthinv17lf -- Rather than seed, less shocky. Forward one period. In 2017 or 2018 dollars to make it real. Logged to normalize somewhat and give a 'change' interpretation
*Growth in growth VC: gg17pcl -- Definitely logged (it's wildly non-normal without and somewhat normal with), even though this gives an change in growth rate interpretation. Could winsorize but logging pretty much deals with outliers anyway. Forward less present over present, so still forward looking.
*We could also conceivably use the change in rank: rankchg (=f.overallrank-overallrank) or rankup (=overallrank-f.overallrank). It would be much 'smoother' than other measures. It's also pretty normal (if a little sharply peaked at 0). However, it would be relative performance measure!
 
Choose dataset:
*Restrict on numlayers. Those with <3 layers can't form hulls. 3 layers is therefore a minimum for a hull based analysis. 6 layers allows 2 hulls, in theory anyway, and doesn't. 6,746 place-years have 2 or more layers (max is 1317), 4,969 have 6 or more layers.
*Restrict on time period. Data is from 1980 to present, through note that we have our synthetic Houstons which have years from 1 to 65 to indicate how many locations were replaced. We might want to restrict to 1986 to present to get good coverage. Or from 1995 to present to do the 'modern era'. Note that 2019 is a half year, so we should probably through it out.
*We are already restricting to the 198 places that had greater than 10 active at some point in their history...
 
Choose scale regressors:
*growthinv17l numdealsl numstartupsl
===Base specification for highest1hull===
The 'highest1hull' (or 2 or 3) is a specification that identifies the highest level that has 1 hull the first time (i.e., starting from level 1, go until the level has 2 hulls and back it up one, or if it doesn't have 2 hulls, just find the highest layer with 1 hull.
 
As a consequence we can't use nohull, as it is always 1. It also turns out that tothullcount and tothulldensity don't matter, likely because we already include numstartupsl (removing it makes tothullcount significant). avgdisthm, however, does work.
===Instrumenting highest1hull===
The instrument on highest1hull needs to work in itself, but we also need a consistent estimate of an effect for the population and for the 13 (or 10, etc.) cities that we can instrument before we start.
It looks like including year fixed effects in the 'standard' highest1hull spec is just too muchwith the reduced sample size. Excluding them looks like it might work We can use a boom indicators though: reg growthinv17lf growthinv17l numdealsl numstartupsl avghullarea boom i.placeid if numlayers>reg1==6 1 & layeryear95==highest1hulllayer1, robust reg growthinv17lf growthinv17l numdealsl numstartupsl avghullarea boom i.placeid if numlayers>reg1==6 1 & layeryear95==highest1hulllayer 1 & tifs==1, robust reg avghullareal avghullarea tifintareahm boom i.placeid if numlayers>reg1==1 & year95==1 & tifs==1, robust tab placeid if tifs==1, gen(tifcity) ivreg2 growthinv17lf growthinv17l numdealsl numstartupsl boom tifcity* (avghullarea=avghullarea_L) if reg1==1 & year95==1 & tifs==1, robust endog(avghullarea) Useful documentation:*https://economics.mit.edu/files/18 -- Nice section on understanding coefficients, LATE, etc.*https://www.nuffield.ox.ac.uk/media/3154/stata-intro-part-iii.pdf -- Using ivreg2 and doing it manually with predict*http://www.repec.org/bocode/i/ivreg2.html - man page for ivreg2*http://fmwww.bc.edu/EC-C/F2012/228/EC228.f2012.nn15.pdf -- Useful material on additional tests (endog, etc.)*https://www.stata.com/statalist/archive/2011-04/msg00877.html -- For the stata tests*https://journals.sagepub.com/doi/pdf/10.1177/1536867X0800700402 -- Actual documentation! Results are as follows:*In the first stage, the instrument is highly (positively statistically significant).*Without the instrument my estimate is small, negative and sig at 5%.*With the instrument, the estimate is small, positive and insig.*Underidentification test is significant, H0 is that model IS underidentified, so this is rejected [https://journals.sagepub.com/doi/pdf/10.1177/1536867X0800700402 p486]*Weak ID test has a Cragg-Donald Wald F statistic of 58.713, which is well above the critical bounds*Overidentification test is 0 (null is identified), and it says "equation exactly identified"*Endogeneity test is 3.375 with Chi-sq(1) P-val = 0.0662. Null is that it is exogeneous, which is rejected.   . ivreg2 growthinv17lf growthinv17l numdealsl numstartupsl boom tifcity* (avghullarea = tifintareahm) if reg1==6 1 & layeryear95==highest1hulllayer > 1 & tifs==1, robustendog(avghullarea) Warning - collinearities detected Vars dropped: tifcity12 IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity Number of obs = 179 F( 16, 162) = 14.82 Prob > F = 0.0000 Total (centered) SS = 412.5880393 Centered R2 = 0.5846 Total (uncentered) SS = 3128.446788 Uncentered R2 = 0.9452 Residual SS = 171.3895217 Root MSE = .9785 ------------------------------------------------------------------------------ | Robust growthin~7lf | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- avghullarea | 7.13e-06 8.44e-06 0.84 0.399 -9.42e-06 .0000237 growthinv17l | .2036359 .1261974 1.61 0.107 -.0437064 .4509783 numdealsl | -.0775574 .1797158 -0.43 0.666 -.4297939 .2746791 numstartupsl | .8311169 .2834661 2.93 0.003 .2755335 1.3867 boom | 1.063169 .2712561 3.92 0.000 .531517 1.594821 tifcity1 | .3257855 .401005 0.81 0.417 -.4601698 1.111741 tifcity2 | .055352 .6979564 0.08 0.937 -1.312617 1.423321 tifcity3 | -.6024319 .7152221 -0.84 0.400 -2.004241 .7993776 tifcity4 | .3234907 .4568871 0.71 0.479 -.5719915 1.218973 tifcity5 | -.2383231 .4824386 -0.49 0.621 -1.183885 .7072392 tifcity6 | -.2941022 .398834 -0.74 0.461 -1.075802 .4875981 tifcity7 | -.8108876 1.113175 -0.73 0.466 -2.992671 1.370896 tifcity8 | .2262667 .5042398 0.45 0.654 -.7620251 1.214558 tifcity9 | -.0979371 .4359115 -0.22 0.822 -.9523079 .7564337 tifcity10 | -1.042832 .6154755 -1.69 0.090 -2.249142 .1634775 tifcity11 | -.3423386 .4180955 -0.82 0.413 -1.161791 .4771135 tifcity12 | 0 (omitted) _cons | -.0235353 .9222309 -0.03 0.980 -1.831075 1.784004 ------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 17.585 Chi-sq(1) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 58.713 (Kleibergen-Paap rk Wald F statistic): 15.579 Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38 15% maximal IV size 8.96 20% maximal IV size 6.66 25% maximal IV size 5.53 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors. ------------------------------------------------------------------------------ Hansen J statistic (overidentification test of all instruments): 0.000 (equation exactly identified) -endog- option: Endogeneity test of endogenous regressors: 3.375 Chi-sq(1) P-val = 0.0662 Regressors tested: avghullarea ------------------------------------------------------------------------------ Instrumented: avghullarea Included instruments: growthinv17l numdealsl numstartupsl boom tifcity1 tifcity2 tifcity3 tifcity4 tifcity5 tifcity6 tifcity7 tifcity8 tifcity9 tifcity10 tifcity11 Excluded instruments: tifintareahm Dropped collinear: tifcity12 ------------------------------------------------------------------------------ Note that using lagged avghullarea as an instrument seems to work... Results are as follows:*With the instrument, the estimate is close to the original, still negative and sig at the 5% level.*Underidentification test is significant, H0 is that model IS underidentified, so this is rejected *Weak ID test has a Cragg-Donald Wald F statistic of 679.195, which is massively above the critical bounds*Overidentification test is 0 (null is identified), and it says "equation exactly identified"*Endogeneity test is 0.037 with Chi-sq(1) P-val = 0.8465. Null is that it is exogeneous, which we can't reject.

Navigation menu