Changes

Jump to navigation Jump to search
=== V2 ===
 
==== New dataset ====
 
The new file is: MasterCode20YearV2-0.txt. It's in the dropbox!
 
==== Summary of request ====
Objectives:
Xunjie's list of variables for estimation:
* matchhqdist -- matchbodist is preferred. was 152/500k nulls, should be done now.
* matchinstagenarrow -- Now in the dataset with improved logic. Probably use matchinstagebroad instead. No nulls.
* firmfirstinvyear -- firmageatdeal is preferred. No Nulls.
* matchprevportcos -- no nulls.
* pcnumperson -- this is a conceptually and operationally terrible variable! See below.
* pccitydollarsrankm1 -- had lots of missing!This should be resolved now.
* pcexp -- similar issues to pcnumperson. See below.
Xunjie's Restrictions:
* I only keep data between 2002 and 2016.
* And I only keep the matching markets if the number of real matches is more than or equal to 5.
So less than 1/3 matching markets survive."
Dropping the entire market is surely way too extreme. We should just drop the offending portco and only drop the market if the number of real matches drops below our threshold (e.g., 5). I've included some new market stats to give analytics: mktdealcount, mktnumreal, mktnumsyn, mktnumfirms, mktvalid.
==== Review of Changes ====
==== pccitydollarsrankm1 ====
There are a number of possible explanations for why this variable has had lots of missing.
There do seem to be missing placenames. 4855/69882 PortCoSuper records don't join to PlaceYearRanking on placename and state (ignoring year) and 4,561 of these have valid zips. However, only 263 had growth VC and just 82 has non-null positive invested amounts, so this isn't the issue.
 
Ultimatately, I rebuild the underlying tables (portgeoid, etc.) and created a new lookup table (PlaceStatecodeGeoid), and then reran the rankings making sure to keep the "no activity" places for each year (tied for last place). The ranking variables should be fixed now.
==== pcnumperson / pcexp ====
** A portco with 2 people who have each held one previous position has pcexp=2
* Non-exec board members (lawyers, investors, etc.) may have worked with lots of previous firms and be inflating this count!
 
I rebuilt these variables so that they have better coverage where possible. I also set doctors, serials, serialceopreses, serialfounders, prevs, prevceopreses, prevfounders to zero when missing in PortCoSuper (I left them as null in PortCoPeopleMaster).
 
We shouldn't use numperson at all. It's just horrible. Instead we should try one of the following:
* serialceopreses
* serialfounders
* serials
* doctors (maybe for something different)
* prevceopreses
* prevfounders
 
But I expect that you we have problems with variation.
==== match in stage ====
WHEN firmstageprefno IS NULL THEN 1::int
ELSE 0::int END AS matchinstagebroad,
 
==== Other ====
 
I added together the patents and SBIR grants pre and during VC to create the following variables (each has variation issues, but maybe try in order):
* pchaspatentsvc (1/0 indicator for portco has patents)
* pcpatentsvc (number of patents)
* pcsbircountvc (number of SBIR grants)
* pcsbiramountvc (value of SBIR grants)
=== Changes to date ===

Navigation menu