Difference between revisions of "VC Acquisitions Paper"

Revision as of 00:31, 7 March 2012

This page details the work rebuilding Brander Egan (2007) - The Role of VCs in Acquisitions for our submission to the RCFS special issue and associated conference.

Connect to the database with:

psql -h 128.32.252.201 -U ed_egan Acqs

Submission Details

The Third Entrepreneurial Finance and Innovation Conference on June 10th-11th in Boston, MA, is supported by the Kauffman Foundation and the Society for financial studies. Conference papers will be considered for inclusion in a special issue of the Review of Corporate Finance Studies.

The conference details are here: http://sites.kauffman.org/efic/overview.cfm

The deadline for submission is March 7th, 2012, though earlier submission is encouraged. Authors will be notified if their paper has been selected by the end of April.

The program committee includes: Thomas Hellmann, Adam Jaffe, Bill Kerr, Josh Lerner, David Robinson, Morten Sorenson, Bob Strom, and others.

Errors in the existing version

The Dierkins 1991 reference is missing:

@article{dierkens1991information,
  title={Information asymmetry and equity issues},
  author={Dierkens, N.},
  journal={Journal of Financial and Quantitative Analysis},
  volume={26},
  number={2},
  pages={181--199},
  year={1991},
  publisher={Cambridge Univ Press}
}

The Boehmer reference has a typo - the second author is Musumeci. Also, in para 2, p.19, I think it was McKinley that "suggest[ed] a method that combines both cross-sectional and time-series information..."

Other points:

There were a few other typos.
Also the GX paper gets very little mention - I thought we had a whole subsection devoted to them...
I was surprised that we didn't have a year fixed-effect variable in the main analysis (though we have Boom, which is more interesting)

Rebuilding the Paper

The paper required a complete rebuild of all the results, with the data updated to the end of 2011. We can also consider several extensions to the paper, detailed in a later section.

Acquisitions Data

SDC Search Criteria

SDC search criteria:

US Targets
Announced: 1/1/1980 to 12/31/2011
Target Nation: US
Acquiror Nation: US
Target Status: Private (V)
Acquirer Status: Public (P)
Percentage of Shares Owned after Transaction: 100 to 100 (will exclude those with missing data)

SDC Variables

The following variables were pulled:

YEARANN, YEAREFF, DA, DE, DATEANNORIG_DAYS, PCTACQ, PCTOWN, DAE, DATEEFFEXP, DUNCON, DAO, VALAMEND, VEST, STATC, VAL, ENTVAL, EQVAL, BIDCOUNT, CONSID_STRUCTURE, CONSID_STRUCT_DESC, CURRC, COUNT_CONSIDO, COUNT_CONSIDS, A_POSTMERGE_OWN_PCT, PCTOWN, PCT_STK, PCT_CASH, PCT_OTHER, PCT_UNKNOWN, AN, ANL, ANATC, ANAICP, AIN, ACU, ASTC, ASTIC, AIP, AUP, AEXCH, ACITY, AZIP, ALBOFIRM, TN, TIN, TCU, TLBOFIRM, TNL, TNATC, TNAICP, TSTC, STIC, TCITY, TZIP, IASS, COMEQ, BV, TASS, SALES, TASS, TLIA, RND, BNKRUPT, TWOSTEPSPIN, CHA, DBT_RESTRUCT, DUTCH, PRIVATIZATION, FBNK, RECAP, GOV_OWN_INVOLV_YN, JV, RESTR, LBO, LIQ, MOE, OMKT, IPO, REVERSE, RUM, SBO, SPIN, SPLIT

This provided (of particular note):

Target Name
Acquirer Name
Transaction Value
Payment Method
Acquisition announcement date
Payment method (cash/stock/mix)
PC of stock in the deal
No. of bidders
Acquirer CUSIP
Target NAIC
Acquirer NAIC
Age (of target)
Sales (of target)
Leverage (of target)
Intangible Assets (of target)

New Flags in SDC (downloaded for exclusions):

Bankruptcy Flag
Failed bank Flag
Leveraged Buyout Flag
Reverse LBO Flag
Spinoff Flag
Splitoff Flag
Target is a Leveraged Buyout Firm
And many others. These will be reviewed and excluded.

Processing Notes

The data was imported into Postgres. There were 41,572 records
The flag variables were reviewed for variation - some had no bite (e.g. Spinoff, TwoStepSpinOff, and Splitoff) and were ignored. Others led to data being discarded as flag exclusions.
All variables were checked for coding, range, dispersion, etc. The following were of particular note:
Restriction were placed on the data (Completion, flags, exclude LBOs) etc. This reduced the data to 40,035 observations
Certain variables were reprocessed (see below)
Acquiror and Target names were keyed to account for repetitions etc.
Duplicate acquisition data (same event) was eliminated
Multiple acquisition of the same target (i.e. a target is acquired, spun-off and acquired again, etc) were eliminated.
CUSIPs were processed into 6, 8, and 9 digit variables, by search COMPUSTAT annual data (Jan 1978 - Jan 2012) using the 6 digit CUSIP and then finding the correct 9 digit CUSIP for a particular issue-year. Note that a 9 digit CUSIP is a 6 digit Issuer Number, a 2 digit Issue Number, and a check digit. There were 27,401 acquisitions with 7,348 valid CUSIPs.
CRSP data was retrieved and processed (see below). After processing we had data for 23,802 observations.
COMPUSTAT data was retrieved and processed (see below)
VC PortCo data was retrieved and processed. PortCos were flagged and portco data added for appropriate observations.
LBO data was retrieved and processed. 164 observations were discarded.
Acquisition Histories were calculated as number of past acquisitions for each acquirer: Total, VC only, Non-VC only

Accounting vars were converted to 2011 real values using the official BEA implicit GDP price deflator index: http://www.bea.gov/national/nipaweb/TableView.asp?SelectedTable=13&ViewSeries=NO&Java=no&Request3Place=N&3Place=N&FromView=YES&Freq=Year&FirstYear=1978&LastYear=2010&3Place=N&Update=Update&JavaBox=no)

Percentage variables were multiplied by 100 to get nice coefficients
Every observation was assigned a unique observation number (obsno)
Compound variables such as horiz,vert, and cong were calculated.

Variable check notes:

1,514 had estimated announce dates. These were flagged.
443 had their transaction value amended. These were flagged.
41,473 had a deal code of 'C' for completed. These were kept.
The number of bidders was always disclosed and 2 in 54 cases and 3 in 2 cases.
Number of considerations offered and sought varied from 1 to 8
State codes were USPS official standards: https://www.usps.com/send/official-abbreviations.htm
There was data from 35 stock exchanges. 32,177 observations recorded Amex, Nasdaq or NYSE.
25 acquirors were LBO firms and 3 targets were LBOs These were excluded.
All acquirors and targets had 6 digit NAIC codes, though some were truncated e.g. 517000 and others invalid. COMPUSTAT NAIC codes were used when SDC NAIC codes failed when these were recorded in WRDS.

Flag Exclusions:

Cases where the target was bankrupt or distressed as indicated by: TargetBankrupt, TargetBankInsolvent, Liquidation, Restructuring.
Cases where the form wasn't genuinely privately-held as indicated by: OpenMarketPurchases, GovOwnedInvolvement, JointVenture, Privatization (which capture government sales).
Cases where there was a share recap going on concurrently with the acquisition: Recap
Targets that had LBO involvement (more will likely be removed in the next phase of matching to LBO targets): LBO, SecondaryBuyoutFlag, ReverseTakeOver (used for LBO'd firms doing a reverse take over), IPOFlag (likewise).
Firms where the deal began as a rumor (so the information leakage is problematic): DealBeganAsRumor

Processing of variables:

The original announce date was determined as min{announcedate, announcedateorg}. Those where annoucedate \ne announcedateorg were flagged.
Percentage stock, cash, other and unknown were reprocessed to include data from the ConsidStruct field, which tags Stock Only, Cash Only, etc.
State codes were reprocessed to numerics using the lookup table below
IT, BT (Biotech), HT (Hightech) and NAIC1, NAIC2, NAIC3, Indu1, Indu2, Indu3 variables were created using the lookup tables below (see the variable descriptions for more info). Note that the IT, BT and HT variables were coding using aggregate codes whereever possible (i.e. 517110, etc, all appear in IT and cover the 517 code entirely, so the 517 block would be coded as IT even if SDC recorded the code as 517000.

Other Notes:

Founding year/Age of the Target was not available in the data. It is in VE for VC-backed only.
The street address is multiline and problematic if included. This can be drawn seperately if needed. We have the City, Zip and State, which is sufficient to get a Google Maps lookup. Likewise 'Competing Offer Flag (Y/N)', also known as COMPETE and Competing Bidder, is a multiline - with each presumably corresponding to a different bidder identity. It was excluded.
The NormalizeFixedWidth.pl script uses the spacing in the header to determine the column breaks. The EquityValue column has two spaces in front of its name that screws this. Both EquityValue and EnterpriseValue needed to be imported as varchar(10), as they have the code 'np' in some observations.
The NormalizeFixedWidth.pl script was modified so that it only drops commas in numbers and not those in names etc.

CRSP Data

Daily return data was downloaded using 8 digit CUSIPs from CRSP. The following variables were retrieved from 1/1/1978-1/1/2012 (the latest month available):

Cusip
Date
prc
ret
vwretd

The data was processed:

Announcedays were coded to the current or next following trading day.
Trading days were indexed from the announcement day (day 0) for all announcement-cusip pairs.
A refined estimation set beginning 280 and ending 30 days before the acquisition was extracted for each announceday-cusip pair
Cusips with multiple announcements on the same day had these announcements flagged and a unique announceday-cusip pair index (acqno) was created

announceday-cusip pair observation were included in an estimation regression provided that there were 50 continuous trading days ending at day -30.

The parameters. errors and statistics from the regression, particularly [math]\hat{\alpha_i},\hat{\beta_i}[/math], were estimated for each announceday-cusip pair in the following regression:

[math]R_i = \hat{\alpha_i} + \hat{\beta_i}R_m + \epsilon[/math]

Days from -5 to +5, to allow for an 11 day window, were extracted into an event window and processed to produce:
- [math] AR_i = R_i- (\hat{\alpha_i} + \hat{\beta_i}R_m) [/math]
- [math] AR^S_i = R_i - R_m [/math]
- Let [math]\epsilon[/math] be the residual from the mkt model regression. Then calc: [math]\sigma_{\epsilon}={( \mathbb{E}(\epsilon - \mathbb{E} \epsilon))}^{\frac{1}{2}}[/math]
- RMSE of the Mkt Model: [math]RMSE={( \mathbb{E}(X- \mathbb{E} X))}^{\frac{1}{2}}[/math] - this is in the ereturn list in STATA and will be used for the Patell Standard Errors.
Then other variables were calculated or included
- The cummulative return [math]CAR_i = \sum_t AR_i[/math]
- THe price 30 days before the acquisition was recorded for the market value calculation

COMPUSTAT Data

From COMPUSTAT we drew accounting variables for all of our Cusips, then extracted data for the announcement years and the lagged announcement years. (Note that Cusip, NAIC, datayear, fiscal year and fiscal year end were included in the download. NAIC was used to supplement SDC NAICs.)

Data included:

Total Assets
Market Value
Sales
Total Liabilities
Intangible Assets
Shares Outstanding

Note that leverage was calculated as: [math]Leverage=\frac{Total\;Liabilities}{Total\;Assets}[/math]

Variables were translated to 2011 dollars and marked varname11, lagged (minus one year) variables were recorded as varname_m1. In STATA log variables were created as varnamel.

The VC PortCos

The following criteria was applied to the SDC search:

Moneytree deals (i.e. VC only)
Company Nation: US
Round date: 1/1/1975 to 1/1/2012

A basic variable set was downloaded including:

PortCo Name
Nation
State
Location
Address
Total VC Invested
Date of First Inv
Date of Last Inv
Date of Founding
No Rounds

Check flags:

Moneytree
Venture Related

The data was reprocessed, specifically:

Unique PortCo Names were determined using Names, States and Location data to determine unique portcos.
Duplicate records were eliminated
Discontinuous (multiple) records (pertaining to the financing history of a single firm) were assembled into single records
PortCos were matched to Acquisition Targets using name based matching, checking state and location information.
In a small number of cases VC appears to continue after the acquisition. This is almost surely an error in VE, but these obserations are flagged.

Note: The coverage of VE before 1980 is problematic, so we will discard acquisition records before 1985 in STATA before the analysis.

Removing additional LBOs

A set of LBO portcos were downloaded from SDC using the flags LBO=yes, PWCMoneytree=No, StdUSVentureDisbursement=No The LBOs were matched against the acquisition targets and removed (LBO initial investment dates were checked).

Processing NAIC Codes

While SDC provide 6 digit NAIC codes for all acquirers, some of these NAIC codes are invalid (proprietary to SDC). These were replaced with COMPUSTAT NAIC codes whenever available. The SDC NAIC codes found were:

 SDCnaic    |                SDCindustry
------------+------------------------------------------------
 BBBBBA     | Miscellaneous Retail Trade
 BBBBBA     | Business Services
 BBBBBA     | Advertising Services
 BBBBBA     | Prepackaged Software
 BBBBBB     | Business Services
 BCCCCA     | Investment & Commodity Firms,Dealers,Exchanges
 BCCCCD     | Investment & Commodity Firms,Dealers,Exchanges
 BCCCCD     | Business Services
 BCCCCD     | Social Services
 BCCCCE     | Investment & Commodity Firms,Dealers,Exchanges

Details of the IT, BT, and HT codes are below.

An acquisition was classified as:

Vert if acquirornaic6=targetnaic6
Horiz if acquirornaic6=targetnaic6 AND acquirornaic5!=targetnaic5
Cong if !Vert AND !Horiz
Related if Vert OR Horiz

Patent Data

NBER patent data with assignee names from 1975-2006 was used to add patent counts to the data. Only patents filed before the annoucement date were included. Assignee names were matched to target names by name matching software, with matches validated by hand. A patent count and 'has patents' variable (patents) were generated and a flag was added to recorded that have their acqusition announcement before 2006. Targets acquired after 2006 have their patent applications up to and including 2006 recorded, though these numbers will as their true counts are right-truncated. Likewise, a target may have existed and made patent applications prior to 1975, resulting in left-truncation. Therefore year fixed-effects are warranted.

Patents and Information Asymmetries

Patents might act to certify their patent-holders in the face of information asymmetries (see, for example, Hsu and Ziedonis, 2007). Thus firms with acquirers of targets with patents might value the certification of a venture capitalist less than when they consider targets without patents. Likewise, on average about 2/3rds of all patent citations are added by examiners (Alcacer and Gittelman, 2006 and Cotropia et al., 2010). Thus citation counts might represent the search costs associated with finding information about patents. That is, patents with more citations are the ones that are easiest to find, and so mitigate information asymmetries the most successfully.

Note: I am working on the 2011 update to the NBER patent data (see: http://www.nber.com/~edegan/w/index.php) but this will NOT be done before the March 7th deadline.

Analyis Calculations and Notes

The following is performed on the dataset before analysis:

Observations were dropped if yearann<1985, to give 5 years of VC data before the announcement
asize = market value + lagged liabilities
rsize = tv/asize
log variables were calculated as log(1+var)
The following aliases were created:
- tit -> it
- tbt -> bt
- tht -> ht (?)
- yearann -> year
Interaction effect variables were created
Year x anaic2 (2 digit acquiror naic) fixed effect indicators were created
CARM variables (Market Model CAR) were created for the 7 day window for the figures
Vscore variables were created for the significance tests on CARS using the RMSE from the estimation window: [math]vscore = abs\left(\frac{carm}{\left(\ frac{rmse}{\sqrt{n}}\right)}\right)[/math]. This was done on a per group basis, using the variable names xgroupvar, where group=it or itvc or null.
Year range variables from 1 to 6 were created for years 1985-1989, ... , 2005-2009, 2010-2011.
In the regression analysis we clustered standard errors on acqno (the cusip-announceday pair that could have multiple acquisitions, marked with sameday=1), using STATA's vce(cluster clustvar) documented here: http://www.stata.com/support/faqs/stat/robust_ref.html

Notes:

The experience variables (# Previous Acqs) are generated using the primary data, and will be truncated by the start of the dataset. We should probably consider year fixed effects to mitigate any induced bias.
In the previous version of the paper we threw out cases when the mkt value of the acquirer was 'very small' relative to TV.
Boom is defined as: [math]1990\le year \le 1999[/math]
The Boehmer standard errors are the cross-sectional ones generated by OLS. Clustering them isn't part of the specification, but clearly should be done.

Supplementary Data

To determine the information asymmetry ranking of sectors again we will need (either for 1 year or across the entire year range 1985-2011):

CRSP:

idiosyncratic volatility of stock returns: requires returns and mkt returns
relative trading volume (this appears to be called TURNOVER, as opposed to absolute volume which is VOLUME. The measure should be relative to the exchange's trading volume)
NAIC

COMPUSTAT:

intangible assets
total assets
Tobin's Q: Market value/book value of assets
NAIC

Variables

The following is quick description of the variables in the Version 3 dataset.

Acquisition Specific Variables

Note: All variables ending "11" are amounts in 2011 dollars

acqno: The acquisition number (an index)
dateann: Date announced
dateannisest: Whether the date announced is estimated
yearann: Year announced
boom
yearcomp: Year completed
dateeff: Date Effective
tv: Transaction Value
tv11
enterpriseval
enterpriseval11
equityval
equityval11
valueamended: Whether the Value was amended
valueamendedupdown: Whether the amendment was Up (1) or Down (0)
valueest: Whether the transaction value is flagged as estimated
factor: The real dollar adjustment factor for the year of the acquisition
factor_m1: The real dollar adjustment factor for the previous year
mergerofequals: Whether SDC flags this as a merger of equals
nobidders
challenged: Whether the deal was challenged (an SDC flag)
noconsidoffer: Number of considerations offered
noconsidsought: Number of considerations sought
pccash: percentage of cash in the deal
pcother: percentage of other considerations (not cash/stock) in the deal
pcstock: percentage of stock in the deal
pcunknown
horiz
vert
cong

Estimation and Event Window Variables

Note: for return variables, _m indicates minus and _p indicates plus days, so that r_m1 is the return on the stock at day minus 1, where 0 is the announcement day or the first trading day following the announcement if the exchange was closed when the announcement was made.

alpha: The constant from the estimation regression
beta: The coefficient on the market return from the estimation regression
prc: The stock price 30 days prior to the announcement
rmse: The RMSE from the estimation regression

Returns for the stock (single period buy and hold, including dividends):

r_0
r_m1
r_m2
r_m3
r_m4
r_m5
r_p1
r_p2
r_p3
r_p4
r_p5

The corresponding market returns on the Value-Weighted Amex-Nasdaq-NYSE composite (including dividends):

m_0
m_m1
m_m2
m_m3
m_m4
m_m5
m_p1
m_p2
m_p3
m_p4
m_p5

Abnormal returns calculated using the market model:

arm_0
arm_m1
arm_m2
arm_m3
arm_m4
arm_m5
arm_p1
arm_p2
arm_p3
arm_p4
arm_p5

Abnormal returns calculated using the subtraction method:

ars_0
ars_m1
ars_m2
ars_m3
ars_m4
ars_m5
ars_p1
ars_p2
ars_p3
ars_p4
ars_p5

The three day cummulative abnormal return (market model):

carm_3

Acquiror Specific Variables

Note: Again "11" indicated values in 2011 dollar, but _m1 indicates the previous year for the annual accounting variables.

aname: The acquiror name
astate: A numeric code for the acquiror state (there is a lookup table)

Whether the acquiror is an IT or Biotech firm (binary):

ait
abt

Acquiror NAIC codes and Indu codes. 1 indicates 1 digit, 2 is 2 digit, and 3 is 3 digit. The indu variables have IT and BT taken out an recoded as 10&11 for 1 digit, 100&101 for 2 digit, and 1000 & 1001 for 3 digit.

anaic
anaic1
anaic2
anaic3
aindu1
aindu2
aindu3

The count of previous acquisitions occuring strictly before the announcement:

noprevacqs
noprevacqsnonvc
noprevacqsvc

Various acquiror accounting variables, for the year of the acquisition and lagged:

assets
assets11
asset11_m1
asset_m1
intangibles
intangibles11
intangibles11_m1
intangibles_m1
leverage
leverage11
leverage11_m1
leverage_m1
liabilities
liabilities11
liabilities11_m1
liabilities_m1
mktvalue
mktvalue11
mktvalue11_m1
mktvalue_m1
mv: Market value calculated using the shares outstanding and the price 30 days before the announcement.
mv11
sales
sales11
sales11_m1
sales_m1
sharesout: the number of shares outstanding
sharesout11: a fake variable to construct mv11
sharesout11_m1
sharesout_m1

Target Specific Variables

tname: Target name
tstate: A numeric code of state (same lookup as acquiror)

The IT, Biotech and NAIC/Indu variables, constructed the same as for the acquiror:

targetit as tit
targetbt as tbt
tindu1
tindu2
tindu3
tnaic
tnaic1
tnaic2
tnaic3

The VC variables:

vc: A binary variable - 1=VC backed, 0 otherwise
firstinvdate
lastinvdate
norounds
totalinvested
totalinvested11
vccontafteracq: Whether VC investment appears to continue after the acquisition is supposed to have completed
foundingdate

Target Accounting Variables

targetcommonequity
targetcommonequity11
targetintangibles
targetintangibles11
targetnetsales
targetnetsales11
targetrandd: R&D
targetrandd11
targettotalassets
targettotalassets11
targettotalliabilities
targettotalliabilities11

Additional Variables

The following variables have now been added to the data:

aht: Uses our HT definition that does not included IT or BT, on the acquiror's NAIC code
aht_pb: Uses the Paytas-Berglund definition of HT
aht_hecker: Uses the Hecker definition of HT
aht_pb_notitbt: Uses the Paytas-Berglund definition of HT, but removes IT and BT
aht_hecker_notitbt: Uses the Hecker definition of HT, but removes IT and BT
tht: The same as above but on the target's NAIC code. THIS IS THE ONE YOU WANT FIRST.
tht_pb
tht_hecker
tht_pb_notitbt
tht_hecker_notitbt
patents: A binary variable indicating whether the firm has patent (1) or not (0) applications filed up to an including the year of the announcement of the acquisition
patentcount: The count of the above patents
patentdata: Takes the value 1 if the announcement year equal to or less than 2006, so the firm can have all of its patents recorded (from 1975 forward), and 0 if the patent data will be inherently truncated.

State Codes

We use the US Postal Service (USPS) Official State Codes, found at: https://www.usps.com/send/official-abbreviations.htm

OfficialCode	NumericCode	State
AK	1	ALASKA
AL	2	ALABAMA
AR	3	ARKANSAS
AS	4	AMERICAN SAMOA
AZ	5	ARIZONA
CA	6	CALIFORNIA
CO	7	COLORADO
CT	8	CONNECTICUT
DC	9	DISTRICT OF COLUMBIA
DE	10	DELAWARE
FL	11	FLORIDA
FM	12	FEDERATED STATES OF MICRONESIA
GA	13	GEORGIA
GU	14	GUAM GU
HI	15	HAWAII
IA	16	IOWA
ID	17	IDAHO
IL	18	ILLINOIS
IN	19	INDIANA
KS	20	KANSAS
KY	21	KENTUCKY
LA	22	LOUISIANA
MA	23	MASSACHUSETTS
MD	24	MARYLAND
ME	25	MAINE
MH	26	MARSHALL ISLANDS
MI	27	MICHIGAN
MN	28	MINNESOTA
MO	29	MISSOURI
MP	30	NORTHERN MARIANA ISLANDS
MS	31	MISSISSIPPI
MT	32	MONTANA
NC	33	NORTH CAROLINA
ND	34	NORTH DAKOTA
NE	35	NEBRASKA
NH	36	NEW HAMPSHIRE
NJ	37	NEW JERSEY
NM	38	NEW MEXICO
NV	39	NEVADA
NY	40	NEW YORK
OH	41	OHIO
OK	42	OKLAHOMA
OR	43	OREGON
PA	44	PENNSYLVANIA
PR	45	PUERTO RICO
PW	46	PALAU
RI	47	RHODE ISLAND
SC	48	SOUTH CAROLINA
SD	49	SOUTH DAKOTA
TN	50	TENNESSEE
TX	51	TEXAS
UT	52	UTAH
VA	53	VIRGINIA
VI	54	VIRGIN ISLANDS
VT	55	VERMONT
WA	56	WASHINGTON
WI	57	WISCONSIN
WV	58	WEST VIRGINIA
WY	59	WYOMING
	99	UNKNOWN

Classifiction of IT, BT and HT

Information and Communications Technology (IT)

The following is our definition of IT:

333295	both	333295  Semiconductor Machinery Manufacturing  
3341	both	334111  Electronic Computer Manufacturing  
3341	both	334112  Computer Storage Device Manufacturing  
3341	both	334113  Computer Terminal Manufacturing  
3341	both	334119  Other Computer Peripheral Equipment Manufacturing  
3342	both	334210  Telephone Apparatus Manufacturing  
3342	both	334220  Radio and Television Broadcasting and Wireless Communications Equipment Manufacturing  
3342	both	334290  Other Communications Equipment Manufacturing  
334413	both	334413  Semiconductor and Related Device Manufacturing
334611	both	334611  Software Reproducing  
334613	both	334613  Magnetic and Optical Recording Media Manufacturing  
33592	both	335921  Fiber Optic Cable Manufacturing  
33592	both	335929  Other Communication and Energy Wire Manufacturing  
42343	both	423430  Computer and Computer Peripheral Equipment and Software Merchant Wholesalers  
42511	both	425110	Business to Business Electronic Markets
44312	both	443120	Computer and Software Stores
4541	both	454111  Electronic Shopping  
4541	both	454112  Electronic Auctions  
4541	both	454113  Mail-Order Houses  
5112	both	511210  Software Publishers  
516	2002	516110  Internet Publishing and Broadcasting  
517	both	517110  Wired Telecommunications Carriers  
517	both	517210  Wireless Telecommunications Carriers (except Satellite)  
517	2002	517211  Paging  
517	2002	517310  Telecommunications Resellers  
517	both	517410  Satellite Telecommunications  
517	2002	517510  Cable and Other Program Distribution  
517	2002	517910  Other Telecommunications  
517	2007	517911  Telecommunications Resellers  
517	2007	517919  All Other Telecommunications  
518	2002	518111  Internet Service Providers  
518	2002	518112  Web Search Portals  
518	both	518210  Data Processing, Hosting, and Related Services  
51913	2007	519130  Internet Publishing and Broadcasting and Web Search Portals  
51919	both	519190  All Other Information Services  
5415	both	541511  Custom Computer Programming Services  
5415	both	541512  Computer Systems Design Services  
5415	both	541513  Computer Facilities Management Services  
5415	both	541519  Other Computer Related Services 
61142	both	611420  Computer Training  
811212	both	811212	Computer and Office Machine Repair and Maintenance
811213	both	811213  Communication Equipment Repair and Maintenance

Biotech (BT)

The following is our definition of Biotech:

3254	both	325411  Medicinal and Botanical Manufacturing  
3254	both	325412  Pharmaceutical Preparation Manufacturing  
3254	both	325413  In-Vitro Diagnostic Substance Manufacturing  
3254	both	325414  Biological Product (except Diagnostic) Manufacturing  
334510	both	334510  Electromedical and Electrotherapeutic Apparatus Manufacturing  
334516	both	334516  Analytical Laboratory Instrument Manufacturing 
334517	both	334517  Irradiation Apparatus Manufacturing  
339112	both	339112  Surgical and Medical Instrument Manufacturing  
339113	both	339113  Surgical Appliance and Supplies Manufacturing  
54138	both	541380  Testing Laboratories  
541711	2007	541711  Research and Development in Biotechnology  
6215	both	621511  Medical Laboratories  
6215	both	621512  Diagnostic Imaging Centers

High Tech (HT)

The following is our definition of other (i.e. Not IT/BT) High-tech:

211	both	211111	211111  Crude Petroleum and Natural Gas Extraction  
211	both	211112	211112  Natural Gas Liquid Extraction  
2211	both	221111	221111  Hydroelectric Power Generation  
2211	both	221112	221112  Fossil Fuel Electric Power Generation  
2211	both	221113	221113  Nuclear Electric Power Generation  
2211	both	221119	221119  Other Electric Power Generation  
2211	both	221121	221121  Electric Bulk Power Transmission and Control  
2211	both	221122	221122  Electric Power Distribution  
324	both	324110	324110  Petroleum Refineries  
324	both	324121	324121  Asphalt Paving Mixture and Block Manufacturing  
324	both	324122	324122  Asphalt Shingle and Coating Materials Manufacturing  
324	both	324191	324191  Petroleum Lubricating Oil and Grease Manufacturing  
324	both	324199	324199  All Other Petroleum and Coal Products Manufacturing  
3251	both	325110	325110  Petrochemical Manufacturing  
3251	both	325120	325120  Industrial Gas Manufacturing  
3251	both	325131	325131  Inorganic Dye and Pigment Manufacturing  
3251	both	325132	325132  Synthetic Organic Dye and Pigment Manufacturing  
3251	both	325181	325181  Alkalies and Chlorine Manufacturing  
3251	both	325182	325182  Carbon Black Manufacturing  
3251	both	325188	325188  All Other Basic Inorganic Chemical Manufacturing  
3251	both	325191	325191  Gum and Wood Chemical Manufacturing  
3251	both	325192	325192  Cyclic Crude and Intermediate Manufacturing  
3251	both	325193	325193  Ethyl Alcohol Manufacturing  
3251	both	325199	325199  All Other Basic Organic Chemical Manufacturing  
3252	both	325211	325211  Plastics Material and Resin Manufacturing  
3252	both	325212	325212  Synthetic Rubber Manufacturing  
3252	both	325221	325221  Cellulosic Organic Fiber Manufacturing  
3252	both	325222	325222  Noncellulosic Organic Fiber Manufacturing  
3253	both	325311	325311  Nitrogenous Fertilizer Manufacturing  
3253	both	325312	325312  Phosphatic Fertilizer Manufacturing  
3253	both	325314	325314  Fertilizer (Mixing Only) Manufacturing  
3253	both	325320	325320  Pesticide and Other Agricultural Chemical Manufacturing  
3255	both	325510	325510  Paint and Coating Manufacturing  
3255	both	325520	325520  Adhesive Manufacturing  
3255	both	325910	325910  Printing Ink Manufacturing  
3259	both	325920	325920  Explosives Manufacturing  
3259	both	325991	325991  Custom Compounding of Purchased Resins  
3259	both	325992	325992  Photographic Film, Paper, Plate, and Chemical Manufacturing  
3259	both	325998	325998  All Other Miscellaneous Chemical Product and Preparation Manufacturing  
33321	both	333210	333210  Sawmill and Woodworking Machinery Manufacturing  
33322	both	333220	333220  Plastics and Rubber Industry Machinery Manufacturing  
333291	both	333291	333291  Paper Industry Machinery Manufacturing  
333292	both	333292	333292  Textile Machinery Manufacturing  
333293	both	333293	333293  Printing Machinery and Equipment Manufacturing  
333294	both	333294	333294  Food Product Machinery Manufacturing  
333298	both	333298	333298  All Other Industrial Machinery Manufacturing  
3333	both	333311	333311  Automatic Vending Machine Manufacturing  
3333	both	333312	333312  Commercial Laundry, Drycleaning, and Pressing Machine Manufacturing  
3333	both	333313	333313  Office Machinery Manufacturing  
3333	both	333314	333314  Optical Instrument and Lens Manufacturing  
3333	both	333315	333315  Photographic and Photocopying Equipment Manufacturing  
3333	both	333319	333319  Other Commercial and Service Industry Machinery Manufacturing  
3336	both	333611	333611  Turbine and Turbine Generator Set Units Manufacturing  
3336	both	333612	333612  Speed Changer, Industrial High-Speed Drive, and Gear Manufacturing  
3336	both	333613	333613  Mechanical Power Transmission Equipment Manufacturing  
3336	both	333618	333618  Other Engine Equipment Manufacturing  
3339	both	333911	333911  Pump and Pumping Equipment Manufacturing  
3339	both	333912	333912  Air and Gas Compressor Manufacturing  
3339	both	333913	333913  Measuring and Dispensing Pump Manufacturing  
3339	both	333921	333921  Elevator and Moving Stairway Manufacturing  
3339	both	333922	333922  Conveyor and Conveying Equipment Manufacturing  
3339	both	333923	333923  Overhead Traveling Crane, Hoist, and Monorail System Manufacturing  
3339	both	333924	333924  Industrial Truck, Tractor, Trailer, and Stacker Machinery Manufacturing  
3339	both	333991	333991  Power-Driven Handtool Manufacturing  
3339	both	333992	333992  Welding and Soldering Equipment Manufacturing  
3339	both	333993	333993  Packaging Machinery Manufacturing  
3339	both	333994	333994  Industrial Process Furnace and Oven Manufacturing  
3339	both	333995	333995  Fluid Power Cylinder and Actuator Manufacturing  
3339	both	333996	333996  Fluid Power Pump and Motor Manufacturing  
3339	both	333997	333997  Scale and Balance Manufacturing  
3339	both	333999	333999  All Other Miscellaneous General Purpose Machinery Manufacturing  
3343	both	334310	334310  Audio and Video Equipment Manufacturing  
334411	both	334411	334411  Electron Tube Manufacturing  
334412	both	334412	334412  Bare Printed Circuit Board Manufacturing  
334414	both	334414	334414  Electronic Capacitor Manufacturing  
334415	both	334415	334415  Electronic Resistor Manufacturing  
334416	both	334416	334416  Electronic Coil, Transformer, and Other Inductor Manufacturing  
334417	both	334417	334417  Electronic Connector Manufacturing  
334418	both	334418	334418  Printed Circuit Assembly (Electronic Assembly) Manufacturing  
334419	both	334419	334419  Other Electronic Component Manufacturing  
334511	both	334511	334511  Search, Detection, Navigation, Guidance, Aeronautical, and Nautical System and Instrument Manufacturing  
334512	both	334512	334512  Automatic Environmental Control Manufacturing for Residential, Commercial, and Appliance Use  
334513	both	334513	334513  Instruments and Related Products Manufacturing for Measuring, Displaying, and Controlling Industrial Process Variables  
334514	both	334514	334514  Totalizing Fluid Meter and Counting Device Manufacturing  
334515	both	334515	334515  Instrument Manufacturing for Measuring and Testing Electricity and Electrical Signals  
334518	both	334518	334518  Watch, Clock, and Part Manufacturing  
334519	both	334519	334519  Other Measuring and Controlling Device Manufacturing  
334612	both	334612	334612  Prerecorded Compact Disc (except Software), Tape, and Record Reproducing  
3353	both	335311	335311  Power, Distribution, and Specialty Transformer Manufacturing  
3353	both	335312	335312  Motor and Generator Manufacturing  
3353	both	335313	335313  Switchgear and Switchboard Apparatus Manufacturing  
3353	both	335314	335314  Relay and Industrial Control Manufacturing  
33591	both	335911	335911  Storage Battery Manufacturing  
33591	both	335912	335912  Primary Battery Manufacturing  
33591	both	335931  Current-Carrying Wiring Device Manufacturing  
33593	both	335932  Noncurrent-Carrying Wiring Device Manufacturing  
33593	both	335991  Carbon and Graphite Product Manufacturing  
33599	both	335999  All Other Miscellaneous Electrical Equipment and Component Manufacturing  
3364	both	336411	336411  Aircraft Manufacturing  
3364	both	336412	336412  Aircraft Engine and Engine Parts Manufacturing  
3364	both	336413	336413  Other Aircraft Parts and Auxiliary Equipment Manufacturing  
3364	both	336414	336414  Guided Missile and Space Vehicle Manufacturing  
3364	both	336415	336415  Guided Missile and Space Vehicle Propulsion Unit and Propulsion Unit Parts Manufacturing  
3364	both	336419	336419  Other Guided Missile and Space Vehicle Parts and Auxiliary Equipment Manufacturing  
3369	both	336991	336991  Motorcycle, Bicycle, and Parts Manufacturing  
3369	both	336992	336992  Military Armored Vehicle, Tank, and Tank Component Manufacturing  
3369	both	336999	336999  All Other Transportation Equipment Manufacturing  
42341	both	423410	423410  Photographic Equipment and Supplies Merchant Wholesalers  
42342	both	423420	423420  Office Equipment Merchant Wholesalers  
42344	both	423440	423440  Other Commercial Equipment Merchant Wholesalers  
42345	both	423450	423450  Medical, Dental, and Hospital Equipment and Supplies Merchant Wholesalers  
42346	both	423460	423460  Ophthalmic Goods Merchant Wholesalers  
42349	both	423490	423490  Other Professional Equipment and Supplies Merchant Wholesalers  
486	both	486110	486110  Pipeline Transportation of Crude Oil  
486	both	486210	486210  Pipeline Transportation of Natural Gas  
486	both	486910	486910  Pipeline Transportation of Refined Petroleum Products  
486	both	486990	486990  All Other Pipeline Transportation  
5232	both	523210	523210  Securities and Commodity Exchanges  
54131	both	541310	541310  Architectural Services  
54132	both	541320	541320  Landscape Architectural Services  
54133	both	541330	541330  Engineering Services  
54134	both	541340	541340  Drafting Services  
54135	both	541350	541350  Building Inspection Services  
54136	both	541360	541360  Geophysical Surveying and Mapping Services  
54137	both	541370	541370  Surveying and Mapping (except Geophysical) Services  
5416	both	541611	541611  Administrative Management and General Management Consulting Services  
5416	both	541612	541612  Human Resources Consulting Services  
5416	both	541613	541613  Marketing Consulting Services  
5416	both	541614	541614  Process, Physical Distribution, and Logistics Consulting Services  
5416	both	541618	541618  Other Management Consulting Services  
5416	both	541620	541620  Environmental Consulting Services  
5416	both	541690	541690  Other Scientific and Technical Consulting Services  
541710	2002	541710	541710  Research and Development in the Physical, Engineering, and Life Sciences  
541712	2007	541712	541712  Research and Development in the Physical, Engineering, and Life Sciences (except Biotechnology)  
541720	both	541720	541720  Research and Development in the Social Sciences and Humanities  
5612	both	561210	561210  Facilities Support Services  
811211	both	811211	811211  Consumer Electronics Repair and Maintenance  
811219	both	811219	811219  Other Electronic and Precision Equipment Repair and Maintenance

Other HT Definitions

The references for the other High-Tech (HT) definitions are:

Hecker, Daniel E.(2005), "High-technology employment: a NAICS-based update", Monthly Labor Review (July): 57-72. http://www.bls.gov/opub/mlr/2005/07/art6full.pdf
Paytas, Jerry and Berglund, Dan (2004), "Technology Industries and Occupations for NAICS Industry Data", Carnegie Mellon University, Center for Economic Development and State Science & Technology Institute.

Extending the paper

Once this 'draft' is complete we can consider some extentions. I am currently working on the VC Reputations data.

VC Reputations

VCs might use their reputations to certify their firms, or these variables might reflect VC experience (and potentially bargaining skill). We can calculate:

Avg or max number of previous acquisitions and/or IPOs conducted by VCs present in the last round of investment into the firm prior to the acquisition announcement.
The number of previous acquisitions and/or IPOs conducted by the lead VC (Note: The defacto standard method of determining the lead investor is to see which (if any) investor was present from the first round in every round until the last.)
Likewise for the average or dollar-invested weighted average of all investors in the port co.
Last round, lead investor, or average number of previous funds raised by investors, or their fund size or total cummulative firm size (i.e. summed across all funds) at the announcement.
Whether the VCs will raise a next fund (though this could actually be endogenuous with the CAR)

Outside Options

Outside options affect bargaining. A VC that is near to the end of its fund when it made it's last investment into the portfolio company (either in terms of dates or dollars), and particularly one that won't raise a next fund, will be unable to continue financing the portco without the acquisition and therefore has no good outside option with which to bargain. We could calculate how near last round investors are to the ends of their funds (and whether they are going to raise another) and take averages etc, to proxy for the outside option.

Bargaining Superstars

It might be the case that some VCs specialize in providing bargaining skills. We could test this hypothesis by:

Creating fixed-effect variables for the presence of each repeat VC in a portfolio company
Regressing these fixed-effects on the CARs and sorting the coefficient into quartiles/deciles etc.
Testing the hypothesis that firms in the top decile are more likely than expected to appear in a last round of financing.

VC Information Asymmetries

Implicit in our argument is that VCs mitigate the information asymmetries between themselves and their portfolio firms effectively. We can refine this argument to consider the degree to which a VC is likely to be informed about their porfolio firm.

Distances

We can use the road or great-circle distance from the lead investor to the portfolio company as a measure of the information acquisition cost. We could also create a cruder but likely more meaningful version of this by creating a binary variable to see whether the lead investor was within a 20-minute drive of the portfolio company (this is the so called '20 minute rule' - discussed as important for monitoring in Tian, 2006). Alternatively we could consider the nearest investor, or the average of the nearest investors across all rounds, etc.

I can get 2,500 requests per IP address (I can run 3+ concurrently from Berkeley) from the Google Maps api, with responses including driving distances and estimated driving times.

Active Monitoring

I can also determine whether the lead VC has a board seat at the portfolio company at the time of the acquisition, as well as the fraction of invested firms with board seats, and the total number of board sets held by VCs (or the fraction), using the identities of the executives. Though this will be particularly difficult in terms of data, I plan on doing it for another project with Toby Stuart anyway.

@@ Line 37: / Line 37: @@
 ==Rebuilding the Paper==
-The paper requires a complete rebuild of all the results, with the data updated to the end of 2011. We should also consider several extensions to the paper, detailed in a later section.
+The paper required a complete rebuild of all the results, with the data updated to the end of 2011. We can also consider several extensions to the paper, detailed in a later section.
-===Main Data===
+===Acquisitions Data===
-Acquisitions (from SDC):
+====SDC Search Criteria====
-*Events from 1980-2011 that meet the following criteria:
+SDC search criteria:
-**Acquirer is publicly traded on the AMEX, Nasdaq or NYSE
+*US Targets
-**Target is privately-held prior to acquisition (note: new restriction - target was not an LBO)
+*Announced: 1/1/1980 to 12/31/2011
-**Acquisition is for 100% of the firm
+*Target Nation: US
-**Acqisition is complete before end of January 2012
+*Acquiror Nation: US
+*Target Status: Private (V)
-Subsequent restriction: Drop acquisitions where market value of assets is negative or very small compared with the TV.
+*Acquirer Status: Public (P)
+*Percentage of Shares Owned after Transaction: 100 to 100 (will exclude those with missing data)
-Venture Capital (from VentureXpert):
-*Portfolio companies that received VC from 1975-2011. Must not be LBOs.
-*LBOs from 1975 to 2011 to ensure that they are not in the control group of privately-held non-VC backed firms)
-Returns (from CRSP):
-*Stock returns for 1 year (250 Calendar days) for the acquirer, ending 30 days before the announcement. This will be the estimation window.
-*Market returns for the same period
-*Stock returns for 7 days beginning 3 days before the announcement and ending 3 days after
-Note: an observation must have 50 days of continuous trading in the estimation window, and be traded in the event window, to be included.
-Accounting Data (From COMPUSTAT):
+====SDC Variables====
-*Various accounting variables for our acquirers, drawn for the year of the acquisition, and the lagged year for total assets.
-===Supplementary Data===
+The following variables were pulled:
-We need to rebuild the industry classification to update it to include NAICS2007 - this has largely been done in another of my papers, but that work was for firms with patents, and it is possible that some codes are still missing.
+YEARANN, YEAREFF, DA, DE, DATEANNORIG_DAYS, PCTACQ, PCTOWN, DAE, DATEEFFEXP, DUNCON, DAO, VALAMEND, VEST, STATC, VAL, ENTVAL, EQVAL, BIDCOUNT, CONSID_STRUCTURE, CONSID_STRUCT_DESC, CURRC, COUNT_CONSIDO, COUNT_CONSIDS, A_POSTMERGE_OWN_PCT, PCTOWN, PCT_STK, PCT_CASH, PCT_OTHER, PCT_UNKNOWN, AN, ANL, ANATC, ANAICP, AIN, ACU, ASTC, ASTIC, AIP, AUP, AEXCH, ACITY, AZIP, ALBOFIRM, TN, TIN, TCU, TLBOFIRM, TNL, TNATC, TNAICP, TSTC, STIC, TCITY, TZIP, IASS, COMEQ, BV, TASS, SALES, TASS, TLIA, RND, BNKRUPT, TWOSTEPSPIN, CHA, DBT_RESTRUCT, DUTCH, PRIVATIZATION, FBNK, RECAP, GOV_OWN_INVOLV_YN, JV, RESTR, LBO, LIQ, MOE, OMKT, IPO, REVERSE, RUM, SBO, SPIN, SPLIT
-To determine the information asymmetry ranking of sectors we will need (either for 1 year or across the entire year range):
-CRSP:
-*idiosyncratic volatility of stock returns: requires returns and mkt returns
-*relative trading volume (this appears to be called TURNOVER, as opposed to absolute volume which is VOLUME. The measure is relative to the exchange's trading volume I think...)
-*NAIC
-COMPUSTAT:
+This provided (of particular note):
-*intangible assets
-*total assets
-*Tobin's Q: Market value/book value of assets
-*NAIC
-===Raw Variables===
-From SDC (for all acquisitions in the sample):
-*Acquisition is completed indicator (as a check)
-*Acquisition percentage (as a check)
 *Target Name
 *Acquirer Name
@@ Line 91: / Line 63: @@
 *Payment Method
 *Acquisition announcement date
-*Acquisition announcement year
-*Total assets of acquirer (if available)
 *Payment method (cash/stock/mix)
 *PC of stock in the deal
 *No. of bidders
-*Acquirer CUSIP (for join to COMPUSTAT)
+*Acquirer CUSIP
 *Target NAIC
-*Acquirer NAIC (if available)
+*Acquirer NAIC
 *Age (of target)
 *Sales (of target)
@@ Line 104: / Line 74: @@
 *Intangible Assets (of target)
-Notes: convert all TVs in 2011 dollars.
+New Flags in SDC (downloaded for exclusions):
+*Bankruptcy Flag
+*Failed bank Flag
+*Leveraged Buyout Flag
+*Reverse LBO Flag
+*Spinoff Flag
+*Splitoff Flag
+*Target is a Leveraged Buyout Firm
+*And many others. These will be reviewed and excluded.
-From COMPUSTAT (for both all acquirers and for the universe of firms):
+====Processing Notes====
-*Total Assets (in year and 1 year lagged)
-*Market Value (SHROUT*Price at start of event window)
+#The data was imported into Postgres. There were 41,572 records
-*Sales
+#The flag variables were reviewed for variation - some had no bite (e.g. Spinoff, TwoStepSpinOff, and Splitoff) and were ignored. Others led to data being discarded as flag exclusions.
-*Leverage variables (Revenue, Variable Cost, Op Income, Net Income, Total Liabilities, Stockholder's equity)
+#All variables were checked for coding, range, dispersion, etc. The following were of particular note:
-*Intangible Assets
+#Restriction were placed on the data (Completion, flags, exclude LBOs) etc. This reduced the data to 40,035 observations
-*NAIC
+#Certain variables were reprocessed (see below)
+#Acquiror and Target names were keyed to account for repetitions etc.
+#Duplicate acquisition data (same event) was eliminated
+#Multiple acquisition of the same target (i.e. a target is acquired, spun-off and acquired again, etc) were eliminated.
+#CUSIPs were processed into 6, 8, and 9 digit variables, by search COMPUSTAT annual data (Jan 1978 - Jan 2012) using the 6 digit CUSIP and then finding the correct 9 digit CUSIP for a particular issue-year. Note that a 9 digit CUSIP is a 6 digit Issuer Number, a 2 digit Issue Number, and a check digit. There were 27,401 acquisitions with 7,348 valid CUSIPs.
+#CRSP data was retrieved and processed (see below). After processing we had data for 23,802 observations.
+#COMPUSTAT data was retrieved and processed (see below)
+#VC PortCo data was retrieved and processed. PortCos were flagged and portco data added for appropriate observations.
+#LBO data was retrieved and processed. 164 observations were discarded.
+#Acquisition Histories were calculated as number of past acquisitions for each acquirer: Total, VC only, Non-VC only
+*Accounting vars were converted to 2011 real values using the official BEA implicit GDP price deflator index: http://www.bea.gov/national/nipaweb/TableView.asp?SelectedTable=13&ViewSeries=NO&Java=no&Request3Place=N&3Place=N&FromView=YES&Freq=Year&FirstYear=1978&LastYear=2010&3Place=N&Update=Update&JavaBox=no)
+#Percentage variables were multiplied by 100 to get nice coefficients
+#Every observation was assigned a unique observation number (obsno)
+#Compound variables such as ''horiz'',''vert'', and ''cong'' were calculated.
-From VentureExpert (all VC backed firms, and all LBOs seperately):
+Variable check notes:
-*VC (binary variable)
+*1,514 had estimated announce dates. These were flagged.
-*VE industry classification (to use as a reference set to update the industry classification)
+*443 had their transaction value amended. These were flagged.
-*LBO (binary variable) to exclude these from the control group
+*41,473 had a deal code of 'C' for completed. These were kept.
-*If we add one or more extension (see below), then we'll need a fully flushed out VE database build including portcos, rounds, deals, funds, firms, and possibly executives.
+*The number of bidders was always disclosed and 2 in 54 cases and 3 in 2 cases.
+*Number of considerations offered and sought varied from 1 to 8
+*State codes were USPS official standards: https://www.usps.com/send/official-abbreviations.htm
+*There was data from 35 stock exchanges. 32,177 observations recorded Amex, Nasdaq or NYSE.
+*25 acquirors were LBO firms and 3 targets were LBOs These were excluded.
+*All acquirors and targets had 6 digit NAIC codes, though some were truncated e.g. 517000 and others invalid. COMPUSTAT NAIC codes were used when SDC NAIC codes failed when these were recorded in WRDS.
-===Calculated variables===
+Flag Exclusions:
+*Cases where the target was bankrupt or distressed as indicated by: TargetBankrupt, TargetBankInsolvent, Liquidation, Restructuring.
+*Cases where the form wasn't genuinely privately-held as indicated by: OpenMarketPurchases, GovOwnedInvolvement, JointVenture, Privatization (which capture government sales).
+*Cases where there was a share recap going on concurrently with the acquisition: Recap
+*Targets that had LBO involvement (more will likely be removed in the next phase of matching to LBO targets): LBO, SecondaryBuyoutFlag, ReverseTakeOver (used for LBO'd firms doing a reverse take over), IPOFlag (likewise).
+*Firms where the deal began as a rumor (so the information leakage is problematic): DealBeganAsRumor
-Returns:
+Processing of variables:
-*<math> AR_i = R_i- (\hat{\alpha_i} + \hat{\beta_i}R_m) </math>
+*The original announce date was determined as min{announcedate, announcedateorg}. Those where annoucedate \ne announcedateorg were flagged.
-*<math> AR^S_i = R_i - R_m </math>
+*Percentage stock, cash, other and unknown were reprocessed to include data from the ConsidStruct field, which tags Stock Only, Cash Only, etc.
-*Let <math>\epsilon</math> be the residual from the mkt model regression. Then calc: <math>\sigma_{\epsilon}={( \mathbb{E}(\epsilon -  \mathbb{E} \epsilon))}^{\frac{1}{2}}</math>
+*State codes were reprocessed to numerics using the lookup table below
-*RMSE of the Mkt Model: <math>RMSE={( \mathbb{E}(X-  \mathbb{E} X))}^{\frac{1}{2}}</math> - this is in the ereturn list in STATA and will be used for the Patell Standard Errors.
+*IT, BT (Biotech), HT (Hightech) and NAIC1, NAIC2, NAIC3, Indu1, Indu2, Indu3 variables were created using the lookup tables below (see the variable descriptions for more info). Note that the IT, BT and HT variables were coding using aggregate codes whereever possible (i.e. 517110, etc, all appear in IT and cover the 517 code entirely, so the 517 block would be coded as IT even if SDC recorded the code as 517000.
-*The cummulative return <math>CAR_i = \sum_t AR_i</math>
-*Check that the Boehmer standard errors are the cross-sectional ones generated by OLS.
-*Check the specification of the McKinley standard errors.
-For the tables we need ARs for 2,3 and 7 day, where 2 day is days 0 & 1, and others are symmetric.
+Other Notes:
+*Founding year/Age of the Target was not available in the data. It is in VE for VC-backed only.
+*The street address is multiline and problematic if included. This can be drawn seperately if needed. We have the City, Zip and State, which is sufficient to get a Google Maps lookup. Likewise 'Competing Offer Flag (Y/N)', also known as COMPETE and Competing Bidder, is a multiline - with each presumably corresponding to a different bidder identity. It was excluded.
+*The NormalizeFixedWidth.pl script uses the spacing in the header to determine the column breaks. The EquityValue column has two spaces in front of its name that screws this. Both EquityValue and EnterpriseValue needed to be imported as varchar(10), as they have the code 'np' in some observations.
+*The NormalizeFixedWidth.pl script was modified so that it only drops commas in numbers and not those in names etc.
-For robustness we need ARs for 5,9, and 11 days.
-SDC:
+===CRSP Data===
-*No of past acquisitions for each acquirer: Total, VC only, Non-VC only
-*Target is VC/Non-VC
-*Acq is Horizontal (same 6 digit), Vertical (same 2 digit/ITBT), Conglomerate (other), and Related (not cong.)
-*3dg NAIC for controls
-*IT/BT/HT and 1dg-NAIC, 2dg-NAIC, other classification. Applied to targets and acquirers.
-Dataset level calculations:
+Daily return data was downloaded using 8 digit CUSIPs from CRSP. The following variables were retrieved from 1/1/1978-1/1/2012 (the latest month available):
-*Boom: <math>1990\le year \le 1999</math>
+*Cusip
-*Leverage: <math>\frac{Total\;Liabilities}{Total\;Assets}</math>
+*Date
+*prc
+*ret
+*vwretd
-==Extending the paper==
+The data was processed:
+#Announcedays were coded to the current or next following trading day.
+#Trading days were indexed from the announcement day (day 0) for all announcement-cusip pairs.
+#A refined estimation set beginning 280 and ending 30 days before the acquisition was extracted for each announceday-cusip pair
+#Cusips with multiple announcements on the same day had these announcements flagged and a unique announceday-cusip pair index (acqno) was created
+announceday-cusip pair observation were included in an estimation regression provided that there were 50 continuous trading days ending at day -30.
+*The parameters. errors and statistics from the regression, particularly <math>\hat{\alpha_i},\hat{\beta_i}</math>, were estimated for each announceday-cusip pair in the following regression:
+<math>R_i = \hat{\alpha_i} + \hat{\beta_i}R_m + \epsilon</math>
+*Days from -5 to +5, to allow for an 11 day window, were extracted into an event window and processed to produce:
+**<math> AR_i = R_i- (\hat{\alpha_i} + \hat{\beta_i}R_m) </math>
+**<math> AR^S_i = R_i - R_m </math>
+**Let <math>\epsilon</math> be the residual from the mkt model regression. Then calc: <math>\sigma_{\epsilon}={( \mathbb{E}(\epsilon -  \mathbb{E} \epsilon))}^{\frac{1}{2}}</math>
+**RMSE of the Mkt Model: <math>RMSE={( \mathbb{E}(X-  \mathbb{E} X))}^{\frac{1}{2}}</math> - this is in the ereturn list in STATA and will be used for the Patell Standard Errors.
+*Then other variables were calculated or included
+**The cummulative return <math>CAR_i = \sum_t AR_i</math>
+**THe price 30 days before the acquisition was recorded for the market value calculation
-Coming back to it, the paper looks a little thin (though clearly the data is a monster already). I think it would benefit from a couple of extensions, particularly the inclusion of something that resembles an instrument. I have the following ideas, which might be feasible in the time we have:
+===COMPUSTAT Data===
-(Note: The defacto standard method of determining the lead investor is to see which (if any) investor was present from the first round.)
+From COMPUSTAT we drew accounting variables for all of our Cusips, then extracted data for the announcement years and the lagged announcement years. (Note that Cusip, NAIC, datayear, fiscal year and fiscal year end were included in the download. NAIC was used to supplement SDC NAICs.)
-===Using Patents===
+Data included:
+*Total Assets
+*Market Value
+*Sales
+*Total Liabilities
+*Intangible Assets
+*Shares Outstanding
-Patents might act to certify their patent-holders in the face of information asymmetries (see, for example, Hsu and Ziedonis, 2007). Thus firms with acquirers of targets with patents might value the certification of a venture capitalist less than when they consider targets without patents. Likewise, on average about 2/3rds of all patent citations are added by examiners (Alcacer and Gittelman, 2006 and Cotropia et al., 2010). Thus citation counts might represent the search costs associated with finding information about patents. That is, patents with more citations are the ones that are easiest to find, and so mitigate information asymmetries the most successfully.
+Note that leverage was calculated as:
+<math>Leverage=\frac{Total\;Liabilities}{Total\;Assets}</math>
-At present I have the 2006 NBER patent data loaded up in a database. I could add in patents and citations up to 2006 with a day or two of work. I am working on the 2011 update to the NBER patent data (see: http://www.nber.com/~edegan/w/index.php) but this will NOT be done before the March 7th deadline.
+Variables were translated to 2011 dollars and marked ''varname11'', lagged (minus one year) variables were recorded as ''varname_m1''. In STATA log variables were created as ''varnamel''.
-===VC Reputations===
+===The VC PortCos===
-We argue, explicitly, that VCs use their reputations to certify thier firms. We can calculate the defacto standard measures of reputation - the number of IPOs and the total number of successful exits, and use these to instrument our effects. This could be done for either the lead investor, or the most successful investor, or a weighted average of all investors (weighting by the number of rounds they participated in, or the proportional dollar value they may have provided). Likewise we can calculate the number of funds the lead investor had successfully raised at the time of the exit, or the average number of funds raised across all investors (again perhaps with a weighting).
+The following criteria was applied to the SDC search:
+*Moneytree deals (i.e. VC only)
+*Company Nation: US
+*Round date: 1/1/1975 to 1/1/2012
-===VC Information Asymmetries===
+A basic variable set was downloaded including:
+*PortCo Name
+*Nation
+*State
+*Location
+*Address
+*Total VC Invested
+*Date of First Inv
+*Date of Last Inv
+*Date of Founding
+*No Rounds
-Implicit in our argument is that VCs mitigate the information asymmetries between themselves and their portfolio firms effectively. We can refine this argument to consider the degree to which a VC is likely to be informed about their porfolio firm.
+Check flags:
+*Moneytree
+*Venture Related
-====Distances====
+The data was reprocessed, specifically:
+*Unique PortCo Names were determined using Names, States and Location data to determine unique portcos.
+*Duplicate records were eliminated
+*Discontinuous (multiple) records (pertaining to the financing history of a single firm) were assembled into single records
+*PortCos were matched to Acquisition Targets using name based matching, checking state and location information.
+*In a small number of cases VC appears to continue after the acquisition. This is almost surely an error in VE, but these obserations are flagged.
-We can use the road or great-circle distance from the lead investor to the portfolio company as a measure of the information acquisition cost. We could also create a cruder but likely more meaningful version of this by creating a binary variable to see whether the lead investor was within a 20-minute drive of the portfolio company (this is the so called '20 minute rule' - discussed as important for monitoring in Tian, 2006). Alternatively we could consider the nearest investor, or the average of the nearest investors across all rounds, etc.
+Note: The coverage of VE before 1980 is problematic, so we will discard acquisition records before 1985 in STATA before the analysis.
-I can get 2,500 requests per IP address (I can run 3+ concurrently from Berkeley) from the Google Maps api, with responses including driving distances and estimated driving times.
+===Removing additional LBOs===
-====Active Monitoring====
+A set of LBO portcos were downloaded from SDC using the flags LBO=yes, PWCMoneytree=No, StdUSVentureDisbursement=No
+The LBOs were matched against the acquisition targets and removed (LBO initial investment dates were checked).
-I can also determine whether the lead VC has a board seat at the portfolio company at the time of the acquisition, as well as the fraction of invested firms with board seats, and the total number of board sets held by VCs (or the fraction), using the identities of the executives. Though this will be particularly difficult in terms of data, I plan on doing it for another project with Toby Stuart anyway.
+===Processing NAIC Codes===
-==Rebuild Plan==
+While SDC provide 6 digit NAIC codes for all acquirers, some of these NAIC codes are invalid (proprietary to SDC). These were replaced with COMPUSTAT NAIC codes whenever available. The SDC NAIC codes found were:
-I suggest that I leave the rebuild of the supplementary information asymmetry dataset (to show that that IT has greater information asymmetries than other sectors) until the end. It is a lot of work, both in terms of assembly time and run-time to do the regressions, and we can use the existing table for the next version if need be. I suspect that this component will take me 3 days on its own.
+  SDCnaic    |                SDCindustry
+ ------------+------------------------------------------------
+  BBBBBA     | Miscellaneous Retail Trade
+  BBBBBA     | Business Services
+  BBBBBA     | Advertising Services
+  BBBBBA     | Prepackaged Software
+  BBBBBB     | Business Services
+  BCCCCA     | Investment & Commodity Firms,Dealers,Exchanges
+  BCCCCD     | Investment & Commodity Firms,Dealers,Exchanges
+  BCCCCD     | Business Services
+  BCCCCD     | Social Services
+  BCCCCE     | Investment & Commodity Firms,Dealers,Exchanges
-The regressions for the estimation window will have a run-time that might be considerable; even given the hardware that I have put together at Berkeley, I suspect that this will take at least 24hrs of compute time. I therefore plan on doing this very early and setting it running.
+Details of the IT, BT, and HT codes are below.
-===Proposed Rebuild Order===
+An acquisition was classified as:
+*''Vert'' if ''acquirornaic6''=''targetnaic6''
+*''Horiz'' if ''acquirornaic6''=''targetnaic6'' AND ''acquirornaic5''!=''targetnaic5''
+*''Cong'' if !''Vert'' AND !''Horiz''
+*''Related'' if ''Vert'' OR ''Horiz''
-My order is therefore:
+===Patent Data===
-#Download, clean, and process the SDC data so that it can be joined to CRSP (and the other data sources)
-#Download the CRSP data for the estimation and event windows. Set the estimation windows running. Build the event window code while they run, and otherwise move forward.
-#Download the VentureXpert data. Pull the portco data first, so we can construct the binary indicator.
-#Update the industry classification, using the old one, my new one, and VentureXpert as a reference set.
-#Download an LBO dataset, so we can remove these firms.
-#Download the COMPUSTAT data, and join it to the SDC data. At this point we should have everything we need to get the basic analysis up and running again.
-#Build out the a full database of VC investments into these portcos so we can calculate distances, monitoring through board positions and reputations. Stop short of actually doing the build of these variables.
-#Download the GNI IPO data to calculate the standard reputation measures and join it up, then calculate these measures.
-#Calculate the distances for all VCs to all acquired targets. Determine lead VCs if feasible and calculate the distance measures.
-#Add in the NBER patent data to 2006, include the number of patents and patents weighted by citations-received (not corrected for truncation)?
-#Rebuild the "Information Asymmetry (IA) by Industry" data.
-===Time Estimates===
+NBER patent data with assignee names from 1975-2006 was used to add patent counts to the data. Only patents filed before the annoucement date were included. Assignee names were matched to target names by name matching software, with matches validated by hand. A patent count and 'has patents' variable (''patents'') were generated and a flag was added to recorded that have their acqusition announcement before 2006. Targets acquired after 2006 have their patent applications up to and including 2006 recorded, though these numbers will as their true counts are right-truncated. Likewise, a target may have existed and made patent applications prior to 1975, resulting in left-truncation. Therefore year fixed-effects are warranted.
-My time estimates are going to be wild for three reasons: It is just really hard to estimate some of these things (the time goes into the things that you don't anticipate being a problem but are); some of my skills are rusty, and on the flip-side I now have some serious hardware to throw at this; and I'm currently recovering from some health problems. However, my best ballpark is:
+====Patents and Information Asymmetries====
-#1 day
-#2 days
-#1/2 day
-#1 day
-#1/2 day
-#1 day + 2 days to get everything together into a dataset for analysis
-#2 days
-#1 day
-#3 days
-#2 days
-#3 days
-By the end of step 6, which I think will take 8 days, I should have a the original data rebuilt and analyzed again.
+Patents might act to certify their patent-holders in the face of information asymmetries (see, for example, Hsu and Ziedonis, 2007). Thus firms with acquirers of targets with patents might value the certification of a venture capitalist less than when they consider targets without patents. Likewise, on average about 2/3rds of all patent citations are added by examiners (Alcacer and Gittelman, 2006 and Cotropia et al., 2010). Thus citation counts might represent the search costs associated with finding information about patents. That is, patents with more citations are the ones that are easiest to find, and so mitigate information asymmetries the most successfully.
-To get steps 7-9, which would give us two good extensions to the data, would add another 6 days. The patent data extension (if wanted) would add another 2, and then the rebuild of IA data is guesstimated at another 3.
-There are 16 calendar days between now and March 7th (excluding the 7th). I am going to lose 2 to a course that I'm taking, and 1 to health-care. That leaves 13, which is one short of the 8+6 for the 2 extensions. I will probably also need one or two days off (I just can't keep working 7 day weeks), but nevertheless, it looks like I should be able to complete the basic rebuild in time, and perhaps (if things go well) add an extension or two.
+Note: I am working on the 2011 update to the NBER patent data (see: http://www.nber.com/~edegan/w/index.php) but this will NOT be done before the March 7th deadline.
-==Rebuild Notes==
+===Analyis Calculations and Notes===
-===Thoughts for discussion===
+The following is performed on the dataset before analysis:
+*Observations were dropped if yearann<1985, to give 5 years of VC data before the announcement
+*asize = market value + lagged liabilities
+*rsize = tv/asize
+*log variables were calculated as log(1+var)
+*The following aliases were created:
+**tit -> it
+**tbt -> bt
+**tht -> ht (?)
+**yearann -> year
+*Interaction effect variables were created
+*Year x anaic2 (2 digit acquiror naic) fixed effect indicators were created
+*CARM variables (Market Model CAR) were created for the 7 day window for the figures
+*Vscore variables were created for the significance tests on CARS using the RMSE from the estimation window: <math>vscore = abs\left(\frac{carm}{\left(\
+frac{rmse}{\sqrt{n}}\right)}\right)</math>. This was done on a per group basis, using the variable names xgroupvar, where group=it or itvc or null.
+*Year range variables from 1 to 6 were created for years 1985-1989, ... , 2005-2009, 2010-2011.
+*In the regression analysis we clustered standard errors on acqno (the cusip-announceday pair that could have multiple acquisitions, marked with ''sameday''=1), using STATA's vce(cluster ''clustvar'') documented here: http://www.stata.com/support/faqs/stat/robust_ref.html
+Notes:
 #The experience variables (# Previous Acqs) are generated using the primary data, and will be truncated by the start of the dataset. We should probably consider year fixed effects to mitigate any induced bias.
+#In the previous version of the paper we threw out cases when the mkt value of the acquirer was 'very small' relative to TV.
+#Boom is defined as: <math>1990\le year \le 1999</math>
+#The Boehmer standard errors are the cross-sectional ones generated by OLS. Clustering them isn't part of the specification, but clearly should be done.
-===Downloading the Acquisitions===
+==Supplementary Data==
-Basic Criteria:
+To determine the information asymmetry ranking of sectors again we will need (either for 1 year or across the entire year range 1985-2011):
-*US Targets
-*Announced: 1/1/1980 to 12/31/2011
-*Target Nation: US
-*Acquiror Nation: US
-*Target Status: Private (V)
-*Acquirer Status: Public (P)
-*Percentage of Shares Owned after Transaction: 100 to 100 (will exclude those with missing data)
-The completed deal flag is in ''Deal Status'' - this will be restricted to 'C' in the processing.
+CRSP:
+*idiosyncratic volatility of stock returns: requires returns and mkt returns
+*relative trading volume (this appears to be called TURNOVER, as opposed to absolute volume which is VOLUME. The measure should be relative to the exchange's trading volume)
+*NAIC
-New Flags in SDC (downloaded for exclusions):
+COMPUSTAT:
-*Bankruptcy Flag
+*intangible assets
-*Failed bank Flag
+*total assets
-*Leveraged Buyout Flag
+*Tobin's Q: Market value/book value of assets
-*Reverse LBO Flag
+*NAIC
-*Spinoff Flag
-*Splitoff Flag
-*Target is a Leveraged Buyout Firm
-*And many others. These will be reviewed and excluded.
-Founding year/Age of the Target was not available in the data. It is in VE for VC-backed only.
-The street address is multiline and problematic if included. This can be drawn seperately if needed. We have the City, Zip and State, which is sufficient to get a Google Maps lookup. Likewise 'Competing Offer Flag (Y/N)', also known as COMPETE and Competing Bidder, is a multiline - with each presumably corresponding to a different bidder identity. It was excluded.
-The NormalizeFixedWidth.pl script uses the spacing in the header to determine the column breaks. The EquityValue column has two spaces in front of its name that screws this. Both EquityValue and EnterpriseValue needed to be imported as varchar(10), as they have the code 'np' in some observations.
-The NormalizeFixedWidth.pl script was modified so that it only drops commas in numbers and not those in names etc.
-===Processing the Acquisitions===
-A large number of 'new' flags are now available in SDC. Most of them have no bite on our data. But I have excluded the following:
-*Cases where the target was bankrupt or distressed as indicated by: TargetBankrupt, TargetBankInsolvent, Liquidation, Restructuring.
-*Cases where the form wasn't genuinely privately-held as indicated by: OpenMarketPurchases, GovOwnedInvolvement, JointVenture, Privatization (which capture government sales).
-*Cases where there was a share recap going on concurrently with the acquisition: Recap
-*Targets that had LBO involvement (more will likely be removed in the next phase of matching to LBO targets): LBO, SecondaryBuyoutFlag, ReverseTakeOver (used for LBO'd firms doing a reverse take over), IPOFlag (likewise).
-*Firms where the deal began as a rumor (so the information leakage is problematic): DealBeganAsRumor
-All together, these constraints reduce us from 41,572 to 40,306 acquisitions. Constraints that had no bite included: Spinoff, TwoStepSpinOff, and Splitoff. Further restricting the data to completed transactions, and those with valid codes for when the transaction value was amended, reduces the data to 40,035.
-Acquiror and Target names were keyed into unique names. The first acquisition of several was taken. 33 records had multiple entries, the correct one of which could not be determined. These were discarded.
-===Retrieving CUSIPs===
-The acquisitions data lists 6 digit CUSIPs, we need the 'correct' (the right issue for the right period) 8 or 9 digit CUSIP with which to search CRSP and COMPUSTAT. A full list of all CUSIPs was retrieved from COMPUSTAT for the period Jan 1978 - Jan 2012 using the annual data.
-===Downloading the VC PortCos===
-The following criteria was applied:
-*Moneytree deals (i.e. VC only)
-*Company Nation: US
-*Round date: 1/1/1975 to 1/1/2012
-Total of 30364 records.
-Note that the coverage of VE before 1980 is problematic.
-Selected fields:
-*Name
-*Nation
-*State
-*Founding Date
-*Total $Inv
-*No Rounds
-*Address info (various fields)
-*Date first inv
-*Date last inv
-Check flags:
-*Moneytree
-*Venture Related
-===Implicit Price Deflators===
-Accounting vars were converted to 2011 real values using the official BEA implicit GDP price deflator index: http://www.bea.gov/national/nipaweb/TableView.asp?SelectedTable=13&ViewSeries=NO&Java=no&Request3Place=N&3Place=N&FromView=YES&Freq=Year&FirstYear=1978&LastYear=2010&3Place=N&Update=Update&JavaBox=no)
-==To Do==
-*Log accounting vars.
-*Acq is material (Can't find defn)?
-Also - throw out :
-*Market Value of acquiror is negative
-*Mkt Value 'very small' relative to TV
 ==Variables==
-The following is quick description of the variables in the Version 1 dataset, in order.
+The following is quick description of the variables in the Version 3 dataset.
 ===Acquisition Specific Variables===
@@ Line 524: / Line 506: @@
 *patentdata: Takes the value 1 if the announcement year equal to or less than 2006, so the firm can have all of its patents recorded (from 1975 forward), and 0 if the patent data will be inherently truncated.
-The references for the other High-Tech (HT) definitions are:
+==State Codes==
-*Hecker, Daniel E.(2005), "High-technology employment: a NAICS-based update", Monthly Labor Review (July): 57-72. http://www.bls.gov/opub/mlr/2005/07/art6full.pdf
-*Paytas, Jerry and Berglund, Dan (2004), "Technology Industries and Occupations for NAICS Industry Data", Carnegie Mellon University, Center for Economic Development and State Science & Technology Institute.
+We use the US Postal Service (USPS) Official State Codes, found at: https://www.usps.com/send/official-abbreviations.htm
+ OfficialCode	NumericCode	State
+ AK	1	ALASKA
+ AL	2	ALABAMA
+ AR	3	ARKANSAS
+ AS	4	AMERICAN SAMOA
+ AZ	5	ARIZONA
+ CA	6	CALIFORNIA
+ CO	7	COLORADO
+ CT	8	CONNECTICUT
+ DC	9	DISTRICT OF COLUMBIA
+ DE	10	DELAWARE
+ FL	11	FLORIDA
+ FM	12	FEDERATED STATES OF MICRONESIA
+ GA	13	GEORGIA
+ GU	14	GUAM GU
+ HI	15	HAWAII
+ IA	16	IOWA
+ ID	17	IDAHO
+ IL	18	ILLINOIS
+ IN	19	INDIANA
+ KS	20	KANSAS
+ KY	21	KENTUCKY
+ LA	22	LOUISIANA
+ MA	23	MASSACHUSETTS
+ MD	24	MARYLAND
+ ME	25	MAINE
+ MH	26	MARSHALL ISLANDS
+ MI	27	MICHIGAN
+ MN	28	MINNESOTA
+ MO	29	MISSOURI
+ MP	30	NORTHERN MARIANA ISLANDS
+ MS	31	MISSISSIPPI
+ MT	32	MONTANA
+ NC	33	NORTH CAROLINA
+ ND	34	NORTH DAKOTA
+ NE	35	NEBRASKA
+ NH	36	NEW HAMPSHIRE
+ NJ	37	NEW JERSEY
+ NM	38	NEW MEXICO
+ NV	39	NEVADA
+ NY	40	NEW YORK
+ OH	41	OHIO
+ OK	42	OKLAHOMA
+ OR	43	OREGON
+ PA	44	PENNSYLVANIA
+ PR	45	PUERTO RICO
+ PW	46	PALAU
+ RI	47	RHODE ISLAND
+ SC	48	SOUTH CAROLINA
+ SD	49	SOUTH DAKOTA
+ TN	50	TENNESSEE
+ TX	51	TEXAS
+ UT	52	UTAH
+ VA	53	VIRGINIA
+ VI	54	VIRGIN ISLANDS
+ VT	55	VERMONT
+ WA	56	WASHINGTON
+ WI	57	WISCONSIN
+ WV	58	WEST VIRGINIA
+ WY	59	WYOMING
+	UNKNOWN
-==NAIC Codes==
+==Classifiction of IT, BT and HT==
 ===Information and Communications Technology (IT)===
@@ Line 595: / Line 639: @@
 	both	621512  Diagnostic Imaging Centers
-===Other High Tech (HT)===
+===High Tech (HT)===
-The following is our definition of other High-tech:
+The following is our definition of other (i.e. Not IT/BT) High-tech:
 	both	211111	211111  Crude Petroleum and Natural Gas Extraction
@@ Line 737: / Line 781: @@
 	both	811211	811211  Consumer Electronics Repair and Maintenance
 	both	811219	811219  Other Electronic and Precision Equipment Repair and Maintenance
+===Other HT Definitions===
+The references for the other High-Tech (HT) definitions are:
+*Hecker, Daniel E.(2005), "High-technology employment: a NAICS-based update", Monthly Labor Review (July): 57-72. http://www.bls.gov/opub/mlr/2005/07/art6full.pdf
+*Paytas, Jerry and Berglund, Dan (2004), "Technology Industries and Occupations for NAICS Industry Data", Carnegie Mellon University, Center for Economic Development and State Science & Technology Institute.
+==Extending the paper==
+Once this 'draft' is complete we can consider some extentions. I am currently working on the VC Reputations data.
+===VC Reputations===
+VCs might use their reputations to certify their firms, or these variables might reflect VC experience (and potentially bargaining skill). We can calculate:
+*Avg or max number of previous acquisitions and/or IPOs conducted by VCs present in the last round of investment into the firm prior to the acquisition announcement.
+*The number of previous acquisitions and/or IPOs conducted by the lead VC (Note: The defacto standard method of determining the lead investor is to see which (if any) investor was present from the first round in every round until the last.)
+*Likewise for the average or dollar-invested weighted average of all investors in the port co.
+*Last round, lead investor, or average number of previous funds raised by investors, or their fund size or total cummulative firm size (i.e. summed across all funds) at the announcement.
+*Whether the VCs will raise a next fund (though this could actually be endogenuous with the CAR)
+===Outside Options===
+Outside options affect bargaining. A VC that is near to the end of its fund when it made it's last investment into the portfolio company (either in terms of dates or dollars), and particularly one that won't raise a next fund, will be unable to continue financing the portco without the acquisition and therefore has no good outside option with which to bargain. We could calculate how near last round investors are to the ends of their funds (and whether they are going to raise another) and take averages etc, to proxy for the outside option.
+===Bargaining Superstars===
+It might be the case that some VCs specialize in providing bargaining skills. We could test this hypothesis by:
+*Creating fixed-effect variables for the presence of each repeat VC in a portfolio company
+*Regressing these fixed-effects on the CARs and sorting the coefficient into quartiles/deciles etc.
+*Testing the hypothesis that firms in the top decile are more likely than expected to appear in a last round of financing.
+===VC Information Asymmetries===
+Implicit in our argument is that VCs mitigate the information asymmetries between themselves and their portfolio firms effectively. We can refine this argument to consider the degree to which a VC is likely to be informed about their porfolio firm.
+====Distances====
+We can use the road or great-circle distance from the lead investor to the portfolio company as a measure of the information acquisition cost. We could also create a cruder but likely more meaningful version of this by creating a binary variable to see whether the lead investor was within a 20-minute drive of the portfolio company (this is the so called '20 minute rule' - discussed as important for monitoring in Tian, 2006). Alternatively we could consider the nearest investor, or the average of the nearest investors across all rounds, etc.
+I can get 2,500 requests per IP address (I can run 3+ concurrently from Berkeley) from the Google Maps api, with responses including driving distances and estimated driving times.
+====Active Monitoring====
+I can also determine whether the lead VC has a board seat at the portfolio company at the time of the acquisition, as well as the fraction of invested firms with board seats, and the total number of board sets held by VCs (or the fraction), using the identities of the executives. Though this will be particularly difficult in terms of data, I plan on doing it for another project with Toby Stuart anyway.