The Effect of Leverage Buyouts on Innovation
|Title||The Effect of Leverage Buyouts on Innovation|
|Author||Ed Egan, Brian Ayash|
|RAs||James Chen, Yimeng Tang|
|© edegan.com, 2016|
See also: LBO Lit Review
- 1 Revision following Rejection from JF
- 2 Version 2
- 2.1 How to do a run
- 2.2 Build Notes
- 2.3 Review of Old Build
- 3 Version 1
- 4 New Version
- 5 Old Version
- 5.1 Abstract
- 5.2 Research Outline
- 5.3 Work Scripts
- 5.4 Work Description
- 5.5 Project Sub-Pages
Revision following Rejection from JF
- Reviewers didn't understand what we were doing. We need to rewrite and maybe clean up the methodology.
- Go through Lerner paper again and recheck our understanding.
- The version of the paper on overleaf has version 1 results and text. Did we ever write a version 2 to go with the updated data?
Rejection feedback is in:
The latest version is in:
- On overleaf: https://www.overleaf.com/project/5e48836f70402f00015944b6
- In E:\projects\lbo
- Diff-in-diff vs. pair fixed effect
- Brian tries to rewrite what we did empirically
- Both read referee reports!
- Ed can fix tone
Standard diff-in-diff references:
- Ashenfelter and Card (1985): http://davidcard.berkeley.edu/papers/train-prog-estimates.pdf
- Card and Kruger (1995): http://www.eco.uc3m.es/docencia/EconomiaAplicada/materiales/CardKrueger94_en.pdf
Review the version 2 results
This section describes how to review the version 2 results, with the expanded sample.
The files are in E:\projects\lbo
Begin by copying the following to July2020:
Change all produced versions to 2-3:
- OldData.dta (used to load a marker to enable comparison with version 1)
Run Datasetup.do to produce MatchInput2V3.txt (and other files). Then change LBOmatchscript to use MatchInput2V3.txt and run:
julia -i .\LBOmatchscript.jl
It throws a bunch of warnings about deprecated code, but seems fine. It produces matchresults.txt, which contains 347 lines. This is identical to matchresults.txt in the parent folder, so we don't need to update MatchedMeasuresPortfolioV2-2.txt. Fix the produced version numbers in Analysis.do and then run it!
Then review the main set of figures. To do this, reprocess FullTTests.xls (and maybe PairGoodTTtests.xls).
The results are in:
TablesAndFiguresV2-3.xlsx, which is in the Current Work subfolder on dropbox, as well as in E:\projects\lbo\July2020
(Note that these figures haven't been styled. Blue is control and Red is LBO throughout). The figures don't tell the same visual story as before, presumably because the updated data has many more recent LBOs, which are markedly different from the earlier LBOs. Time and industry effects likely mask the effect.
The main results are in DIDs.xml. These are briefly reprocessed in TablesAndFiguresV2-3.xlsx and just the Diff-in-Diff coefficient is reported below:
|Indu and Year FE||-0.0973||-0.229***||-0.102**||-0.0561***||0.0141||0.0610**||-0.0183||0.304**|
|Indu and Year FE (With T<10)||-0.0319||-0.260***||-0.141***||-0.0490***||0.00367||0.0543**||-0.0121||0.118|
So, despite the graphs, the main results are here! The diff-in-diff coeffs show that an LBO reduces patent flows, grants, and purchases, and increases term expirations. Only the results on R&D are unexpected, though the hadlbo coefficient is significant, negative and about the same magnitude as the diff-in-diff coeff for R&D, so this would suggest that PE firms target firms with low R&D and then increase it back to conventional levels. The pair fixed effects leave the coefficients unchanged (versus no controls) but reduce the standard error. We should check/prove that this is correct.
NOTE: Let's also put the logit and the treatment vs. control sample T-tests here!
Get Version 1 Running Again
The following files for version 1 were copied from the main dropbox folder (./LBO_Innovation_w_Egan) to a Version1 in E:\orojects\lbo:
I also copied over the 4 main compiled results files and the notes files:
- All Results.xlsx (contains raw copies from source files plus partially processed graphs)
- Graphs - Editted.xlsx (graphs with imperfect formatting but appears most recent)
- Tables - Most Recent.xlsx (seems to be production tables)
- Graphs - Most Recent.xlsx (graphs with imperfect formatting)
- Ed's Notes.txt
- Notes by Ed.xlsx (counts of LBOs in other papers, inc. Lerner)
I checked that Datasetup wouldn't overwrite any data files and renamed the log to lbov1-r2-3.log. Then I ran it. It didn't produce any excel sheets. Then I ran Analysis.do, without uncommenting the data build section. This built:
- BeforeVsAfter.xls and .log
- PropensityScore.xml and .log
Brian asked for a range on the Tobin's Q given in table 2. Table 2 should have been produced in DataSetup.do, but the descriptive statistics section appeared missing. Possibly it was cut and paste out during the creation of Version 2. The LaTeX says that the Table was generated from the sheet Descriptives, which is in Tables-Most Recent. It appears to come from some tabstats but on data that had been adjusted for inflation with the CPI, which wasn't loaded until Analysis.do
- Seems consistent with Davis et al. for public-to-privates.
- Would spend more time explaining how it differs from the recent Cumming et al paper. Results seem effectively the
So, Steve didn't read the paper at all. But I don't see the Davis cite in our reference, though I know Brian has written a comment on at least one of his LBO papers. The relevant cite is probably:
Davis, S. J., Haltiwanger, J., Handley, K., Jarmin, R., Lerner, J., & Miranda, J. (2014). Private equity, jobs, and productivity. American Economic Review, 104(12), 3956-90.
Of course, our paper is not consistent with Davis, at least concerning their findings on TFP gains. And we do differentiate our paper from Cummings. We could perhaps differentiate it further though.
- Certainly, our work on PE and jobs, productivity, etc. suggests that public-to-private deals are real outliers relative to other transactions along many dimensions. Would it make sense to look at a broader spectrum of deals? Another advantage is that it would substantially increase your sample size.
- Seagate is a real outlier. As you saw in our paper, we excluded it. How do you handle it? Or do you not include it in your sample?
- One issue we really struggled with was spin-off of entities around the time of IPOs, and differences in policies regarding the assignment of patents. How do you think about this issue?
- The industry categories you use seem quite broad. Thus, I would like to see more exploration of the matching procedure and the robustness of the results to different approaches.
- It would be helpful to talk more about the sources for the sample of transactions, to compare against other databases and the data-sets used in other papers (e.g., the Kaplan work, Kaplan-Stromberg JEP paper, and so forth).
- I could not quite square the numbers in Table 2 with the reported number of buyouts in the overall sample in the text. In general, being more careful on sample sizes, etc., and explaining the details would be helpful.
- Can you look at R&D expenditures after the buyouts for at least some firms (e.g., if they still make security filings)?
- Josh's first and fourth comments appear somewhat disingenuous. We made a great play of how it was important to isolate public-to-private transactions and not mix them as he has done. Mixing provides the opposite results, which is good press for the industry... So let's ignore those comments.
- We address an outlier firm without naming it on page 23 ("One firm held 426 patents...") and page 28 ("Our mean sell-off statistic includes, but is not driven by, a single leveraged buyout selling 798 patents – this is the same firm that went on to have 426 patents by the end of the year of its buyout, shown in Table 4."). Going through the data, this firm is OWEN-ILLINOIS INC, and the sale was of 767 patents, not 798. Seagate is in the original dataset as SEAGATE TECHNOLOGY PLC (gvkey=150937) and SEAGATE TECHNOLOGY-OLD (gvkey=9545). Neither of these records are marked as having had an LBO. We should look at the source data to see if there is some reason why Seagate isn't in our dataset.
- If spinoffs were being assigned patents, and these assignments were recorded by the USPTO, we would see them as sales in our data. We may want to add a comment to this effect.
- Yes, the industry categories are quite broad. We could spend eternity exploring the matching procedure, so let's ignore that one.
- The comment on sample sizes and explaining details is fair. We need to be clearer about what the unit of observation is when we report results.
- Yes, we can look at R&D filings and we did!
"Very nice paper. If I remember correctly Lerner et al. did not have a control group, so the results were a bit questionable. it is nice to show that relative to a control group LBOs reduce innovation. Two quick reactions:"
- The obvious endogeneity concern is that for some reason private equity will target firms that are expected to reduce innovation (that is, innovation would have decreased irrespective of the LBO). it is impossible to completely address it, but it may be useful to discuss explicitly and perhaps try to look at whether innovation "predicts" LBOs. For example, perhaps firms with low R&D are more likely to be targeted, and this reduces innovation post-LBO.
- There is an alternative measure of innovation that looks at the market value of granted patents (Kogan et al., QJE). If it is the case that LBOs reduce the total value of innovation, that would add to your points.
He also said "I think RFS is not out of reach. The issue here as you know is that the authors of the existing paper may challenge your results if it is sent to them. Hopefully, the editor will send it to someone neutral." Which makes him practically clairvoyant, given what happened when we sent it to JF.
Addressing his concerns:
- A paragraph on the many and varied endogeneity issues is a very good idea. We should have done this more properly.
- The paper (in the dropbox folder) seems to use an event study approach to isolate the stock market reaction (in terms of market cap) to a patent issue. We can't do that - it's too much work. Also, almost all (85%+) of patent applications result in at least one issue, and the 'news' occurs when the patent application is made public (usually 18 months after filing), so that whole paper is pretty questionable. But then the team is outstanding and includes Amit Seru. They also solicited comments from the who's who of innovation economics.
Kogan, L., Papanikolaou, D., Seru, A., & Stoffman, N. (2017). Technological innovation, resource allocation, and growth. The Quarterly Journal of Economics, 132(2), 665-712.
The paper was rejected by Amit Seru as an editor of the Journal of Finance (though he was nicer than the AE or the reviewers).
Amit's comments were as follows (emphasis added):
Thank you for submitting your paper, MS 2019-0353, "The Effect of U.S. Public-to-Private Leveraged Buyouts on Innovation," for publication in the Journal of Finance. I have now received excellent feedback on your paper from two referees. The referee reports are attached to this email. Both referees recommend that I reject the paper. The AE also recommends a rejection. The main concerns of reviewing team center around the marginal contribution of the study and its empirical execution.
The AE summarized the sentiment of the reviewing team in the note to me as "I largely agree with the referees. I have concerns about the methodology (there is a lack of transparency in the way the paper is written) and even taking the results at face value I don't find the contribution large enough to publish the paper in JF":
- The topic of innovation and LBOs is already well occupied by existing papers. To make a contribution vis a vis Lerner-Sorensen-Stromberg, we would have to think a bit deeper about the economic mechanism that links research and capital structure. Why would PE target destroy value ? In LSS, LBOs seem to be creating value through some form of disciplining effect and better innovation. But LSS have a different sample, mostly made of private to private deals.
- Even if I take the results at face value, the decrease in patent stock mostly comes from the target selling patents right in the year when takeovers take place. This is the main result of the present paper. But is this value destroying ? We do not know. The million $ question in this literature is whether LBO targets really destroy value. But it's a hard one.
- The methodology is opaque. Is it Diff in Diff or matching ? There doesn't seem to be pair fixed effects (so it's closer to DD, than close to the LSS methodology than the authors claim). Even though the authors never actually write down the econometric equation. In any case, using matching is not a big innovation in the literature, many papers already do it, and in a more transparent way. The true innovation in this literature would be to find an IV.
- To make a paper on LBO publishable in JF, I think one of two things need to be done: (1) having a real source of variation or (2) providing clear and unambiguous proof that LBO target destroy value by being too short-termist. The paper unfortunately does none of the two (but these are both very hard problems, so I really don't blame the authors)
- Like the referees, I would recommend that the authors rewrite the paper to make much more transparent. Settle on a sample (not removing data points every time you run a different regressions). Use it to replicate existing finding (LSS seems like a good template, perhaps others). Confirm existing results, and then slowly build from there.
- Something I was confused about: How do the authors get information on post-LBO R&D spending? After the LBO the firm is not in compustat anymore. perhaps there is an explanation somewhere but I missed it.
I have also read your paper and found a lot to like about it. However, I was not able to convince myself to overrule the concerns raised by the reviewing team. I have therefore decided to reject your paper.
- The last comment from the AE is rather strange (and makes it unlikely that the AE was Josh). We discussed the R&D data fairly extensively.
- The main weakness was the "opaqueness of the methodology". We need to fix this.
- One instrument would be withdrawn LBOs, providing that they were withdrawn due to a change in market conditions, and not their innovation profile (C.f. Bernstein? -- his job market paper was on withdrawn IPOs and landed him a job at Stanford in around 2012).
Select comments from Reviewer 1 (uses terminology like 'forward citation' and doesn't understand 'synthetic controls'):
- The paper currently reads as if the main sales point is the methodology
- The exact method they used however is not clear. In the introduction they describe it as “difference-in-difference”, but in the text and particularly in the tables is hard to understand the empirical strategy (it doesn’t help that the equations are not written in the text).
- Instead, I would focus on what I think is the strength here, which is understanding more deeply how corporate innovation changes after LBOs, an in particular the distinction between in-house innovation and patent purchases, as well as the more strategic considerations of managing patent portfolios (expiration, renewals etc.)
- the authors argue that “…the entrepreneurial theory of LBOS stands alone in its positive connotations for innovation…” (pg. 12). Instead, I would argue that at least two other theories could predict a positive effect on innovation from LBOs. There is a long-standing hypothesis that LBOs (specially public to private—the focus here) can relieve managers from short‐term pressures from public shareholders. Similarly, LBOs can also alleviate agency issues between shareholders and managers, which can include the avoidance of innovation investments by CEOs with career concerns.
- the authors’ description of the “myopic based theory” is contradictory. I don’t see how they conclude that the treatment effect of an LBO would be to curtail innovation activities. One of their main points leading to the argument is that Bernstein (2015) shows lower innovation in public firms.
- The writing is at times confusing and contradictory. For example, in pg 13 it is not clear what the authors mean by “We also require each LBO to be matched to a control firm in the same industry, operating at approximately the same tie, and holding within 20% of the LBO target firm’s patent stock…” Do you mean to say that matched firms co-own 20% of the patents of the LBO target?
- Agree with point 3. Let's de-emphasize the methodology and instead emphasize the corporate innovation decisions.
- Clean up the description of the sample, etc.
- Let's strip down and/or clean up the theories. They aren't the main selling point. We just need a value creation story and a value consumption story.
Select comments from Reviewer 2 (likely one of Lerner, Sorenson, or Stromberg):
- This is a well-written paper that investigates an important question. While other papers have tackled this same question... there is room in the literature for a paper that revisits this question using rich new data and a careful identification strategy. The submitted paper does analyze richer patenting data than prior studies, and some of its descriptive findings are certainly interesting. However, I am afraid that the paper’s identification strategy and discussion of identification challenges falls well below the standards now expected by readers of the JF.
- the authors do not appear to fully appreciate the endogeneity challenges they face in trying to identify a causal relationship between LBOs and innovation... private equity (PE) firms do not choose LBO candidates randomly. PE firms do extensive due diligence on their target firms... the concern remains that the matched firms likely differ from LBO firms on dimensions that the authors do not observe (and even some that they do observe) and these differences, and not the LBO itself, might drive the post- LBO differences in their investment activities.
- To be sure, addressing this endogeneity challenge is a tall order, and there is no straightforward solution to it. But at the very least I would advise the authors to acknowledge the challenge and avoid overstatements such as “[w]e use a difference-in-differences approach to address issues of causality” (p. 3). Davis et al. (AER 2014) is a helpful example of how the endogeneity challenge can be discussed.
- It would also be helpful if the authors could explore the sensitivity of their results to their matching choices
- it would be helpful if the authors could provide a more thorough discussion of ... why are LBOs associated with a decrease in patenting?
- the “myopia-based theory claim[ing] that the market is inefficient and does not correctly value long term investment projects” (p. 9) has often been cited as a reason why PE firms can generate value by taking firms private and thereby allowing them to focus on long-term investments
- The authors appear to hold a more negative view of these cuts, seeing them as “worrisome” (p.
35) and thus implying that they harm the long-term value of the firms. However, it is not clear that cutting productive innovative investments will help PE firms increase the value at which they can sell their portfolio companies five years down the road. In particular, the so-called “non-productive balance sheet restructuring” that the authors discuss in page 11 would only be optimal if PE firms expect to sell their companies at a multiple of the book value of the companies’ assets
- I thought the introduction is written in a very casual tone, with several non-obvious statements that are not backed by any references. Some examples are:
- “[P]ublic-to-private leveraged buyouts remain particularly important and controversial.” (p. 1)
- “The empirical literature also leads one to strongly suspect that public-to-private leveraged buyouts cause reduced innovation. An empirical study in the late 1980s and early 1990s established the stylized fact that increased leverage is associated with reduced R&D expenditures.” (p. 2)
- “Our sample consists solely of material, innovative U.S. public-to-private buyouts that were leverage financed, and it is much bigger than previous samples of its kind.” (p. 3). I am not sure what the authors mean by “material” or what other previous samples they are comparing their sample to.
- Also, I thought that the theoretical discussion in Section 2 of how innovative activities might be affected by LBOs is biased toward theories that view LBOs as negatively impacting innovation. In particular, I am not sure the statement in page 12 that “[t]his entrepreneurial theory of LBOs stands alone in its potential positive connotations for innovation” is correct. For instance, as discussed above, short-termism and other agency theories can predict that private firms will invest more in innovation than public ones
- a) The authors write in p. 4: “Like Lerner et al. (2011), we find that patents held by innovative LBO firms receive more citations after their buyouts. However, we are able to attribute this solely to citation inflation: we find no statistically significant difference in citations received between LBO firm patents and control firm patents before or due to an LBO.” Lerner et al. (2011) are aware of such citation inflation, and they address it by scaling citations by the citations received by matching patents (pp. 459-460).
- b) The sentence in p. 19, “In our sample, firms that experience a leveraged buyout are large, and those that have patents are about half as large again as those that do not” is confusing
- c) The sentence in p. 23, “However, including either firms without patents in the year of the buyout, or firms with a large number of patents does drive our results” appears to be missing “not” (as in “does not drive…”).
- We aren't going to change the identification strategy but lets up the discussion of identification challenges and "acknowledge the [endogeneity] challenge[s] and avoid overstatements".
- Let's remove the "non-obvious statements that are not backed by any references" and instead make it a list of references. Just substitute boring for concisely informative and we are good to go!
- Lerner et al. did adjust for citation inflation. Either I misread it, or was looking at a different version of the paper.
- Rather than using "material" let's just provide the asset cut-off again.
- Add the 'not' in does NOT drive our results!
The HBS Working paper version of Lerner et al (in dropbox) on p16 says: "We observe a clear increase in the average number of citations for the patents granted to the private equity-backed firms. In part, this may reflect the increasing importance of patents in later years, but it may also reflect two other changes. As the pace of patenting world-wide accelerates, the frequency of patent citations has increased. Furthermore, as private equity investments in hightechnology industries become more common, the representation of patents in technologically dynamic industries has grown. Figure 3 captures these trends, and this figure shows a clear increase in the average number of citations, as well as the dispersion of citations, for the matching patents. Hence, it is important to control for the timing of the patent grant and its technology class. To address this concern, Table 2 also reports the relative citation counts. These are calculated as the number of citations in the calendar year of the grant and the three calendar years thereafter (citation count) less the average number of citations during this period to matching patents, which have the same grant year and primary USPTO class."
How to do a run
Before you begin
All of the following assumes that you're working on Ed's infrastructure. If you're not, you'll need to set up a dbase server, and load up the lbo dbase, either by restoring it from a backup (see below) or by running Buyouts.sql and having the appropriate source files available (in the zip "sourcefiles.zip").
To restore the backup to a postgres server
createdb lbo pg_restore -Fc -d lbo lbo_Fc.dump
To build it from source files:
unzip sourcefiles.zip createdb lbo psql lbo \i Buyouts.sql
Buyouts.sql produces Masterv2-2.txt. If you have this in hand, you'll need the following to do a run:
- LBOmatchscript.jl and LBOmatching.jl
- The lbo database loaded up on a postgres dbase server
- Some static STATA files: CPI.dta, OldData.dta.
The sequence is then:
1 Run Datasetup.do
2 Do the matching
julia -i LBOmatchscript.jl Note that this is going to throw lots of warnings about various methods being deprecated, but it works just fine. MatchInput2V2.txt must be in the same folder as the script. Edit lines 61-66 to change the matching criteria. Matching takes place at t-1! Also -i dumps you to an interactive prompt for debugging. println("Number of LBOs with complete data available for matching ", size(LBOs,1)) Should be 173 diagdf Should be 0×9 DataFrame, but otherwise will show unmatched exit()
3 Create the patent measures
psql lbo Note that you will need to open the connection from a directory containing matchresults.txt and PatentMeasures.sql \i PatentMeasures.sql
4 Run the analysis
do Analysis.do Make sure the following files are all in the directory ready to go: matchresults.txt -- from 2 MatchedMeasuresPortfolioV2-2.txt -- from 3 PreMergeRelevantV2-2.dta -- from 1 CPI.dta -- static file OldData -- static file
5 Wonder whether this was all worth it.
6 Edit the excel files (some are xml files that have to be opened from inside of Excel), optionally moving columns up to create alignment and using regexes to get rid of unwanted rows. Create and apply a chart template to simplify the standardized production of figures. Use the excel2latex plugin to export tables to LaTeX.
The plan is to build a new database made from the following components:
- LBO data with GVKEYs, preferably cleaned to deal with durations (etc.), imported directly from a spreadsheet
- New draw of the universe COMPUSTAT data, with all appropriate variables, covering 1975-2017 (or 2018 if possible), processed to remove duplicates etc.
- Imports of patenthistwoprebuys and gvkeyyearpatentdata, which are available in allpatentsprocessed
- Import of sicnaicsconversion, as we need the lookup table(s)
- Imports of patents, citations, and HistMasterFile, all of which are available in allpatentsprocessed -- we can then run the code from PatentMeasures.sql to generate the merge data once we've done the matching.
- Also import assignees, even though we won't be using it directly
Note that citations_cleaned (74,164,171) appears to be just citations (97,680,838) restricted to integer patent numbers but we are going to rebuild it anyway.
The COMPUSTAT query will be (bold vars are those important to this paper):
- Jan 1975 to July 2019 (latest allowed)
- GVKEY, Entire Dbase
- Fields (Note that it was dvt not dt):
- Company Name, Ticker, CUSIP, Stock Exchange Code, state
- naics, sic,
- addzip, city, loc, prcc_f, ipodate, fyear
- act, artfs, at, ceq, che, csho, dlc, dltt, dvt, dvc, dvp, ebitda, emp, esubc, ibc, intan, lct, lt, ni, oibdp, opiti, re, revt, sale, txditc, txt, xidoc, xint, xrd, xsga, xstfws
This was pulled twice, with OPITI included the second time (which also went to July 2019). The results are in e28f8123921e8343.txt which are saved to Z:\lbo. The request summary was saved to E:\projects\lbo.
Data Request ID e28f8123921e8343 Libraries/Data Sets compa/funda / Frequency/Date Range ann / 01Jan1975 - 31Jul2019 Search Variable GVKEY Input Codes all item(s) -all- Conditional Statements n/a Output format/Compression tab / Variables Selected CONM TIC CUSIP EXCHG FYEAR ACT ARTFS AT CEQ CEQL CHE DLC DLTT INTAN LCT LT RE TXDITC DVC DVP DVT EBITDA NI OIBDP OPITI REVT SALE TXT XINT XRD XSGA XSTFWS ESUBC IBC XIDOC CSHO EMP PRCC_F Extra Variables and Parameters Selected C INDL FS STD
The LBO data is being drawn from a spreadsheet provided by Brian. The latest version is 484_LBOs_Outcomes_Master_Table.xlsx
Ed took the following columns:
dealnumb fyear gvkey name goprdt extdt1 exttype1 sic
Other columns were excluded because they were either constant across records or derivative. In addition, one record, Aurora Electronics Inc, was excluded as it didn't have a GVKEY.
The GVKEYs have been through Excel, which treated them as numbers and omitted their leading zeros. They will have to loaded as text and then reprocessed as follows:
SELECT LPAD(gvkey::text, 6, '0') as gvkey
There are currently supposed to be 521 LBOs, and we have data on 483 (these are exported to Z:\lbo\LBOBase.txt). We are therefore missing data on 521-483=38. Their names and dealnums (including for Aurora Electronics Inc) into E:\projects\lbo\MissingLBOFirmNames.txt. The DISTINCT conm (names) and GVKEYs of all COMPUSTAT firms are in DistinctCSconm.txt (37759). These were matched using the Matcher. The results are in MissingLBOFirmNames.txt-DistinctCSconm.txt.matched, which is also a sheet called Matching in 484_LBOs_Outcomes_Master_Table.xlsx, as well as posted below.
File1:Coname |File1:Dealnu|File2:conm |File2:gvkey|Normalization ------------------------------------------------------------------------------------------------------------- Baldwin Piano & Organ | 19|BALDWIN PIANO & ORGAN CO | 12656|Hall Pamida Inc | 70|PAMIDA INC | 8306|Hall-Warning:Multiple Pamida Inc | 70|PAMIDA INC-DEL | 14838|Hall-Warning:Multiple Boys Markets Inc | 71|BOYS MARKETS INC | 13739|Hall Budget Rent-A-Car Corp| 77|BUDGET RENT-A-CAR CORP | 13394|Hall Amphenol Corp | 97|AMPHENOL CORP | 14282|Hall Eagle Food Centers | 114|EAGLE FOOD CENTERS INC | 15424|Hall Grand Union Co | 132|GRAND UNION CO | 61094|Hall-Warning:Multiple Grand Union Co | 132|GRAND UNION CO -OLD | 5261|Hall-Warning:Multiple Grand Union Co | 208|GRAND UNION CO | 61094|Hall-Warning:Multiple Grand Union Co | 208|GRAND UNION CO -OLD | 5261|Hall-Warning:Multiple Advantage Health Corp | 225|ADVANTAGE HEALTH CORP | 24927|Hall Gruntal Financial | 250|GRUNTAL FINANCIAL CORP | 11710|Hall Constar International | 294|CONSTAR INTERNATIONAL INC| 4049|Hall Woodstream Corp | 393|WOODSTREAM CORP | 11578|Hall Loehmanns Holdings Inc| 406|LOEHMANN'S HOLDINGS INC | 15346|Hall Collins Industries Inc| 491|COLLINS INDUSTRIES INC | 3178|Hall EMPIRE INC. | 541|EMPIRE CO LTD -CL A | 14312|Hall PANNILL KNITTING | 551|PANNILL KNITTING CO INC | 12325|Hall GENERAL HOST CORP | 565|GENERAL HOST CORP | 5060|Hall SBARRO INC | 568|SBARRO INC | 9441|Hall
There were 18 matches (counting both Grand Union Co), but Pamida matches to two GVKEYs and we need to work out which Grand Union Co is which. Both Pamida GVKEYs rightly belong to Pamida (they cover different time periods for the same firm), so the latter one was eliminated from the lbo dbase:
UPDATE COMPUSTAT SET gvkey='008306' WHERE GVKEY='014838';
Likewise, both Grand Unions appear to be the same firm, so their GVKEYs were merged:
UPDATE COMPUSTAT SET gvkey='061094' WHERE GVKEY='005261';
The GoPrvDate appears to be 1997 (from the spike in debt in the COMPUSTAT data), so dealnum 132 was kept (gopvt in 1998) and dealnum 208 was dropped (gopvt 1999).
There are some other issues with the data too. Two firms don't have gopvtdates, and so I can't create the last Fyear field for them or use them...
Dealnum CoName 565 GENERAL HOST CORP 568 SBARRO INC
And several firms only have data way before or somewhat after their GoPvtDate:
dealnumb fyear gvkey name goprdt extdt1 exttype1 250 1986 11710 Gruntal Financial 3/31/1997 4/15/2002 Strategic Buyer 393 1987 11578 Woodstream Corp 6/12/2003 4/15/2012 Still Private 19 1985 12656 Baldwin Piano & Organ Co 6/19/1984 7/15/1986 IPO 114 1988 15424 Eagle Food Centers 11/10/1987 2/15/2000 Bankruptcy
After this name-based matching process, there are 20 LBOs still unmatched. They are (also in StillMissingLBOs.txt in the dropbox and E:\lbo):
LBOName DealNum -------- ------- AME Inc 196 American Electric Auto Co Inc 358 Anderson Tully Co 440 Becor Western Inc 94 Borg-Warner Corp 98 Camelot Music Holdings 234 Colt Industries Inc 140 Condec Corp 14 Consolidated Hydro Inc 165 EFTC Corp 342 Finl Institutions Ins Grp Ltd 242 Florida Steel Corp 166 G N I GROUP INC 566 Genesis Health Ventures Inc 318 International Controls Corp 100 Katz Capital Corp 237 MIDAMERICAN ENERGY HLDGS CO NEW (not really an LBO) 567 Research Cottrell Inc 107 Steel Heddle Manufacturing Co 179
There are also 498 'good' LBO records, which are in E:\projects\lbo\LBOBase.txt, as well as the 'Export to Dbase' sheet of the spreadsheet. Note that GENERAL HOST CORP and SBARRO INC aren't currently included because we don't have LBO dates for them. Of the 498, 490 have CS data in t=0, t=-1 or t=-2. Extending this to t=-3 adds just a single LBO and so isn't worth it. Moving to just t=0 and t=-1 reduces the count to 434, which was a big loss.
We can't use all of the LBOs because sometimes they are the same firm having more than one LBO. The following records have that issue:
SELECT gvkey, count(*) FROM LBO GROUP BY gvkey ORDER BY COUNT(*) DESC; /* gvkey count 013657 2 029909 2 013394 2 005976 2 009502 2 */ SELECT * FROM LBO WHERE GVKEY IN ('013657', '029909', '013394', '005976', '009502') ORDER BY GVKEY;
dealnum fyear gvkey name gopvtdate gopvtyear exitdate exityear lbodurdays lboduryears exittype exitcode sic 281 1997 005976 Insilco Corp 1998-08-17 1998 2002-12-15 2002 1581 4.3315067 Bankruptcy 5 3678 178 1987 005976 Insilco Corp 1988-12-14 1988 1991-01-15 1991 762 2.0876713 Bankruptcy 5 3678 47 1984 009502 SCOA Industries Inc 1985-12-10 1985 2012-07-15 2012 9714 26.613699 Still Private 4 5331 555 1984 009502 SCOA 1985-12-01 1985 1997-03-01 1997 4108 11.254794 Financial Buyer 3 5331 77 1986 013394 Budget Rent-A-Car Corp 1986-09-30 1986 1987-04-15 1987 197 0.539726 IPO 1 191 1988 013394 Budget Rent-A-Car Corp 1989-03-30 1989 1996-08-15 1996 2695 7.3835616 Strategic Buyer 2 7510 561 1996 013657 Alliance Imaging Inc acquired by Apollo 1997-07-15 1997 1999-11-03 1999 841 2.3041096 Financial Buyer 3 ...8071 185 1987 013657 Alliance Imaging Inc 1988-11-16 1988 1991-11-15 1991 1094 2.9972603 IPO 1 8071 347 1999 029909 Petco Animal Supplies Inc 2000-10-02 2000 2002-02-15 2002 501 1.3726027 IPO 1 5990 476 2005 029909 Petco Animal Supplies Inc_2 2006-10-26 2006 2012-07-15 2012 2089 5.7232876 Still Private 4 ...5990
We are therefore going to drop the following dealnums, as they represent a second LBO of the same firm:
- 281 Insilco Corp 1998-08-17
- 47 SCOA Industries Inc 1985-12-10
- 191 Budget Rent-A-Car Corp 1989-03-30
- 561 Alliance Imaging Inc acquired by Apollo 1997-07-15
- 476 Petco Animal Supplies Inc_2 2006-10-26
We are losing a lot of LBOs because of the exchange code restriction:
SELECT exchg, count(*) FROM ( SELECT CSwNAICS.gvkey, max(exchg) as exchg FROM CSwNAICS JOIN LBOvalid ON LBOvalid.gvkey=CSwNAICS.gvkey GROUP BY CSwNAICS.gvkey ) AS T GROUP BY exchg ORDER BY COUNT(*) DESC; exchg count 11 148 NYSE 14 142 NASDAQ 1 41 Priv (Can) 3 36 LBO! 0 35 Sub/Priv 12 35 Amex 19 24 Philly 13 22 OTC 7 1 Toronto 20 1 Unlisted Evaluated Equity
Our valid exchange codes are 11, 12, 14. The COMPUSTAT lookup table is below:
Stock Exchange Code Variable Name = EXCHG This item identifies the major exchange on which the company's Common/Ordinary Stock (Capital) is traded. Below are some of the common exchange codes. A complete list is available in the Compustat manual. Code Description Exchange Country 0 Subsidiary/Private Canada 0 Subsidiary/Private United States 1 Non-traded Company or Security Canada 1 Non-traded Company or Security Germany 1 Non-traded Company or Security Switzerland 1 Non-traded Company or Security Taiwan 1 Non-traded Company or Security United States 2 Consolidated Parent or Tracking Stock Company United States 3 Leveraged Buyout United States 4 Additional Company Record-PreSFAS 94, ProForma, PreAmended United States 7 Toronto Canada 8 Montreal Stock Exchange Canada 9 Canadian Venture Exchange Canada 10 Alberta Stock Exchange Canada 11 New York Stock Exchange United States 12 American Stock Exchange United States 13 OTC Bulletin Board United States 14 NASDAQ-NMS Stock Market United States 15 Boston Stock Exchange United States 16 Midwest Exchange (Chicago) United States 17 Pacific Exchange United States 18 Philadelphia Exchange United States 19 Other-OTC United States 20 Unlisted Evaluated Equity United States
The problem seems to be that the exchange code is not temporal - it doesn't change over time, instead it is just a single fixed value for the entire COMPUSTAT record. Loc (i.e. country code) is also not temporal. Almost all of our LBOs are in the U.S.:
loc count USA 482 CAN 3
I am therefore going to restrict on being U.S. (which will cost us 3 LBOs) but not on exchange. In STATA we restrict on assets, and I'm going to trust Brian that these are all publicly traded firms undergoing LBOs. For the controls, we are just going to have to hope as well...
by gvkey: egen assetfilter = max(at >= 40 & at != .)
In the initial version of the dataset there was a supposed COMPUSTAT variable named dt which was later labelled debt total in STATA. This variable appears to have been an error. It looks like DVT (dividends total) was pulled and accidentally labelled DT. There does not appear to be a DT variable in COMPUSTAT. Then in STATA, DT was used:
generate debtcapital:"Debt to capital" = dt / (at - lt)
I now load and label a DVT variable and calculate debt to capital as
generate debtcapital:"Debt to capital" = lt / (at - lt)
The new code is in:
Database is lbo and dbase server directory is /bulk/lbo
The COMPUSTAT query is saved on the mcnair (AmoDate) account, and the last result is saved as e28f8123921e8343.txt
The SQL code assembles data for all gvkey year combinations, so many observations have blank data. LBOValid restricts the LBOs to:
- dealnum NOT IN (281,47,191,561,476)
- CS data available one or two years before the LBO, and US HQ in CS. Note that the new CS data goes up to July 2019.
SELECT (gopvtyear-csfyear) as diff, count(*) FROM lbovalid GROUP BY (gopvtyear-csfyear); diff count 1 422 2 56
The final data contains 478 LBOs, 180 of which have patents at some point in their lives.
The output of this SQL code is:
We then run DataSetup.do, which (gvkeys, LBOs, LBOs w/patents ever):
- Loads Masterv2-2.txt (83 vars, 1,294,171 obs): 30594, 478, 180
- Restricts to declared assets >= 40m at some point: 17788, 472, 179
- Requires non-LBO firms to be US non-OTC exchanges: 12131 (was 12106 with NNA exchanges), 472, 179
We find that Tobin's Q, lvg and revgrowth are heavily skewed so we log transform them or use a polynomial
gen tobinql=ln(tobinq) gen revgrowthl=ln(1+revgrowth) gen lvgsq=lvg*lvg
We also create markers:
- LBOmarker, 1/0 for when the LBO takes place
- LBOregmarker, 1/0 for the last year of CS data (-1 or -2)
- Somepatenting, 1/0 for patent(s) at some point in history
- matchfilter for whether the obs is in the match set (at -1)
- regfilter is a legacy marker, for if you don't want to move accounting variables from t-2 to t-1.
- savefilter for whether the obs will be saved for the analysis
The propensity score regression is:
. logit lbomatchmarker lvg tobinql revgrowth taxtoassets ebitdatoassets revtl if matchfilter==1, robust Iteration 0: log pseudolikelihood = -1027.1539 Iteration 1: log pseudolikelihood = -1008.441 Iteration 2: log pseudolikelihood = -988.56148 Iteration 3: log pseudolikelihood = -987.85551 Iteration 4: log pseudolikelihood = -986.7324 Iteration 5: log pseudolikelihood = -986.63042 Iteration 6: log pseudolikelihood = -986.62504 Iteration 7: log pseudolikelihood = -986.62501 Logistic regression Number of obs = 61,090 Wald chi2(6) = 129.38 Prob > chi2 = 0.0000 Log pseudolikelihood = -986.62501 Pseudo R2 = 0.0395 -------------------------------------------------------------------------------- | Robust lbomatchmarker | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- lvg | .2638132 .039153 6.74 0.000 .1870748 .3405516 tobinql | -1.41471 .1566152 -9.03 0.000 -1.72167 -1.107749 revgrowth | -.0303849 .0741461 -0.41 0.682 -.1757085 .1149387 taxtoassets | 2.43517 1.126206 2.16 0.031 .2278463 4.642494 ebitdatoassets | 3.148241 .7061438 4.46 0.000 1.764224 4.532257 revtl | .063724 .0270392 2.36 0.018 .0107282 .1167198 _cons | -6.485486 .2214917 -29.28 0.000 -6.919602 -6.05137 -------------------------------------------------------------------------------- Note: 46 failures and 0 successes completely determined.
With the increased sample, we now find that LBOs are associated with reduced likelihood of patenting overall. However, this results is driven by the LBOs in period 1 from 1980 to 1995. In period 2, from 1995 to 2015, there is no statistically significant association. We also tried using decades but this wasn't instructive.
The matching scripts, LBOmatchingscript.jl and LBOmatching.jl, were rewritten by Ed to run with Julia 1.1.1. They use the LBOmatchmarker and matchfilter -- we are matching at t-1 (obs with logitpreg only at t-2 have it, and the variables used to create it, copied from t-2 to t-1). Note that we only export if matchfilter==1 to MatchInput2V1.txt. All 179 patenting LBOs are in there, however, not all of them have a logitpreg:
. count if hadlbo==1 & regfilter==1 179 . count if hadlbo==1 & regfilter==1 & n2 <. 179 . count if hadlbo==1 & regfilter==1 & logitpreg <. & n2 <. 173
This is because variables just aren't available to calculate them. This was explored and is the best it reasonably can be after applying some manual fixes (see below).
su tobinql if regfilter & hadlbo //175 su revgrowthl if regfilter & hadlbo //176 su ebitdatoassets if regfilter & hadlbo //179 ...all other vars have full coverage
Note that Datasetup.do produces the PatentingLikelihood.xml (and .txt) files and analysis. There is time variation in the results now.
We then match at t-1 using contemporary variables, inluding patentstock. To run the matching script:
julia -i LBOmatchscript.jl dumps you to an interactive shell afterwards to do diagnostics: println("Number of LBOs with complete data available for matching ", size(LBOs,1)) diagdf
We also adjusted the bounds on patenting to 25% (and rounded the result to an integer) in the match script(s). This gave 173 matches from 173 candidates (i.e. we caught one unmatched at the 20% bound). See below for more information about catching the one unmatched. We also adjusted the script so that it just outputs the matches and throws out everything else. This can be uncommented if needed.
Now import the matches into the dbase and generate the patent statistics using PatentMeasures.sql. Note that it is not possible to generate the patent measures for all of Compustat -- the data is just too big.
- We are using all citations, not citations_cleaned, as the all cited and citing are bigint in citations. This means we have higher counts this time around.
- Originality is based on citations received and generality is based on citations made. There may have been an error in originality in the previous version (hard to tell - misnamed tables)
- The Match5yrCites table is big: 170,169,703 obs -- it is the explorative/exploitive measures that are very hard to build. The citation measures could likely be built for every CS firm 1974-present.
- Ultimately creates: MatchedMeasuresPortfolioV2-2.txt
Make sure that there are copies of PatentMeasures.sql and matchresults.txt in Z:/lbo. Then. to run the SQL script, ssh on to mother (192.168.2.121) as researcher, and:
cd /bulk/lbo psql lbo \i PatentMeasures.sql
Then run Analysis.do. The current versions of the datasets with their origins are:
- Datasetup.do -> MatchInput2V2.txt -> LBOmatchscript.jl -> matchresults.txt -> matchresultsV2-2.dta
- Datasetup.do -> MatchInput2V2.txt -> LBOmatchscript.jl -> matchresults.txt -> PatentMeasures.sql (on dbase lbo)-> MatchedMeasuresPortfolioV2-2.txt -> PatentStatisticsV2-2.dta
- Datasetup.do -> PreMergeRelevantV2-2.dta
Analysis.do was updated to take into account that the matches are made at t-1. Its preparatory code was also entirely rewritten. Markers for LBOs included in the old data are loaded (as olddata and olddataflag, at the pair level) for reference. None of the old matches are refound in the new data -- these are entirely new matches, likely as the logit specification was changed.
The output of Analysis.do is:
A quick build of the graphs is in MainTTests-Reprocessed.xlsx. Making a chart template is a good idea  but it won't let you save everything you'd like to do (especially it won't put on the error bars automatically). Ed's chart template was saved in C:\Users\ed\AppData\Roaming\Microsoft\Templates\Charts\LBOPaper.crtx. From what I recall the spec should be:
- Times New Roman in a large font size (12pt?)
- Black and white, with dash lines for controls and low line thickness (1pt for graph, 0.5pt for error bars)
- Axis title(s) but no title. Legend to say control or LBO on graph itself to reduce footprint.
Catching the one unmatched
173 in, 1 unmatched:
1×9 DataFrame │ Row │ gvkey │ year │ lboregmarker │ patentstock │ indu3 │ logitpreg │ hadlbo │ regfilter │ matchpair │ │ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Float64 │ Int64 │ Int64 │ Float64 │ ├─────┼───────┼───────┼──────────────┼─────────────┼───────┼────────────┼────────┼───────────┼───────────┤ │ 1 │ 8367 │ 1983 │ 1 │ 27 │ 3 │ 0.00564947 │ 1 │ 1 │ -0.1 │
This is a firm that went private in 1985 and stayed that way (still private at exit year 2012). We are currently matching +/-1 2yrs around the reg date (i.e., -1 or -2 yrs before).
In regfilter=1, indu3=3, year>=1981 & <=1985, there are 923 observations (multiple per firm), of which 2 are LBOs:
gvkey year lboregmarker patentstock indu3 logitpreg hadlbo regfilter 7298 1983 1 0 3 0.002207293 1 1 8367 1983 1 27 3 0.005649468 1 1
But the nearest patent counts are 17,18,19,20,21...34,35. There are some close logitpregs for those counts:
gvkey year lboregmarker patentstock indu3 logitpreg hadlbo regfilter 4127 1983 0 18 3 0.006133826 0 1 4127 1984 0 20 3 0.006820232 0 1 4127 1985 0 21 3 0.008283807 0 1 4452 1981 0 35 3 0.007059322 0 1 4452 1982 0 40 3 0.00471063 0 1 5137 1981 0 34 3 0.004571791 0 1 10859 1981 0 17 3 0.008149391 0 1 10859 1982 0 17 3 0.010007644 0 1 10859 1983 0 17 3 0.006547268 0 1 10859 1984 0 19 3 0.005754435 0 1 10859 1985 0 19 3 0.005588403 0 1
We can catch several obs if we move to 25% bounds (especially if we round up/down to integers) on patenting. After doing this we have 173 matches!
Notes on manual fixes
First run matches 162 (163 but no match 66). Although the 179 are going in, some are missing indu3 (because they are missing n2) and some are missing logitpreg.
. count if hadlbo==1 & regfilter==1 179 . count if hadlbo==1 & regfilter==1 & logitpreg <. 171 . count if hadlbo==1 & regfilter==1 & logitpreg <. & indu3 <. 163 . count if hadlbo==1 & regfilter==1 & logitpreg <. & n2 <. 163
One solution was to create an additional sic to naics lookup and then union it with the official one. Because of the courseness of the newly included SIC and so NAICS codes, this will only work when using 2 digit NAICS codes. This got us to 171 in and 170 matched.
julia> diagdf 1×9 DataFrame │ Row │ gvkey │ year │ lboregmarker │ patentstock │ indu3 │ logitpreg │ hadlbo │ regfilter │ matchpair │ │ │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Float64 │ Int64 │ Int64 │ Float64 │ ├─────┼───────┼───────┼──────────────┼─────────────┼───────┼────────────┼────────┼───────────┼───────────┤ │ 1 │ 8367 │ 1983 │ 1 │ 27 │ 3 │ 0.00616416 │ 1 │ 1 │ -0.1 │
//Diagnostics on logitpreg
su gvkey if regfilter & hadlbo //179 su lvg if regfilter & hadlbo //177 su lvgsq if regfilter & hadlbo //177 su tobinql if regfilter & hadlbo //172 su revgrowthl if regfilter & hadlbo //174 su taxtoassets if regfilter & hadlbo //177 su ebitdatoassets if regfilter & hadlbo //176 su revtl if regfilter & hadlbo //177
tab gvkey if regfilter & hadlbo & revgrowthl==. Compustat | gvkey | Freq. Percent Cum. ------------+----------------------------------- 3051 | 1 20.00 20.00 FIXED -1 is 1992, which has no vals. -2 has vals. 14276 | 1 20.00 40.00 1987,-2 and has data for just that year. Would need revt for 1986. 14628 | 1 20.00 60.00 1987,-1 and no data for -2. Could use 0 but some leverage step up there already! 15309 | 1 20.00 80.00 1988,-1 and had datga for just that year. Would need revt for 1987. 25389 | 1 20.00 100.00 FIXED -1 is 1995, which has no vals. -2 has vals. ------------+----------------------------------- Total | 5 100.00
The two marked fixed where fixed in the data:
UPDATE LBOvalid SET CSfyear = 1991 WHERE gvkey='003051'; --1 UPDATE LBOvalid SET CSfyear = 1994 WHERE gvkey='025389'; --1
Next go after Tobin's q. The definition used is:
generate tobinq:"Tobin's Q" = (csho*prcc_f)+lt/at
. tab gvkey if regfilter & hadlbo & tobinq==. & gvkey!=3051 & gvkey!=25389 Compustat | gvkey | Freq. Percent Cum. ------------+----------------------------------- 6005 | 1 20.00 20.00 1994 (1992), missing prcc_f 10160 | 1 20.00 40.00 1990 (1989), missing prcc_f 14628 | 1 20.00 60.00 1988 (1987), missing prcc_f but no prev year anyway... 15309 | 1 20.00 80.00 No prev year... 118321 | 1 20.00 100.00 2007(2005), missing csho. Could use previous year's of 42.266.. ------------+----------------------------------- Total | 5 100.00
Ignoring the two with no previous year...
gvkey year incs tic conm at csho lt exchg prcc_f gopvtyear exityear csfyear hadlbo 6005 1992 1 3IGRR INTEGRATED RESOURCES INC 1338.19 8.929 2429.61 13 1994 1996 1992 1 10160 1989 1 AG1 SUNBEAM CORP 785.105 .001 444.262 0 1990 1992 1989 1 118321 2005 1 SRNA.1 SERENA SOFTWARE INC 671.61 369.614 14 23.61 2006 2012 2005 1 1.229053
Found SEC order for Sunbeam for artificially inflating its stock price: "From the last quarter of 1996 until June 1998, Sunbeam Corporation's senior management created the illusion of a successful restructuring of Sunbeam in order to inflate its stock price and thus improve its value as an acquisition target..."
Couldn't find prices for either Sunbeam or Intergrated, so used the previous year's csho for Serena and moved on.
UPDATE CS SET csho=42.266 WHERE gvkey='118321' AND fyear=2005;
Review of Old Build
This section was written as review of the old build to assist with the current build. Readers can skip it unless they are interested in legacy construction.
Restore allpatentsprocessed (as researcher):
cd /bulk/mcnair/backups createdb allpatentsprocessed pg_restore -Fc -d allpatentsprocessed allpatentsprocessed_Fc_230818.dump
It looks like DataExtension.sql in E:\mcnair\Projects\LBO\Clean\Ed Run used usptopad on the old rdp to make tables for all of COMPUSTAT and import them into allpatentsprocessed. usptopad was not archived in the Big Move -- we might want to back it (and other rdp dbases) up!
It also looks like some development notes are missing. I can't find them in either E:\mcnair\Projects\LBO\Clean\Ed Run or the LBO dropbox, or Dropbox\coauthoredprojects\NonMcNair.
However, Analysis.do loads/merges the following:
- PreMergeRelevant.dta, which is produced by Datasetup.do, after a keep if savefilter. The sole input to Datasetup.do is masterv1-0.txt
- matchresultsV7, which comes from matchresults - V7 nofe lagps 92 R0 indu3 20pc 5yr.txt, which is generated by running LBOmatchscript.jl on MatchInputV7.txt, which in turn is made by Datasetup.do
- PatentStatistics.dta, which needs to be produced using PatentMeasures.sql on allpatentsprocessed once all matches have been identified
export delimited gvkey year lboentry lboexit patentflow patentstock lagpatentstock indu3 indu5 indu6 decade logitpreg logitpregwindu3fe logitpregwfe hadlbo matchfilter somepatenting using "E:\McNair\Projects\LBO\Clean\Ed Run\MatchInputV7.txt", delimiter(tab) nolabel quote replace
PatentMeasures.sql takes MatchedGVKeys.txt (184) and builds the patent measures (see below) in allpatentsprocessed, producing MatchedMeasuresPortfolio.txt (8468), which is merged in Analysis.do after being resaved as PatentStatistics.dta.
DROP TABLE MatchedMeasuresPortfolio; CREATE TABLE MatchedMeasuresPortfolio AS SELECT gvkeyint as gvkey, year, avg(exploitive) as avgexploitive, avg(Explorative70) as avgexplorative70, avg(Explorative80) as avgexplorative80, avg(Explorative90) as avgexplorative90, avg(gen) as avggen, avg(genunbiased) as avggenub, avg(org) as avgorg, avg(orgunbiased) as avgorgub, avg(citesmade) as avgcitesmade, avg(claims) as avgclaims FROM MatchedMeasuresBase GROUP BY gvkeyint, year ORDER BY gvkeyint, year; --8648
Required input tables are:
- HistMasterFile, which is the NBER historical masterfile FROM 'historical_masterfile.csv'
All three are in allpatentsprocessed
The patentexpiry table was built in allpatentsprocessed by running PatentExpiry.sql, which is in E:\mcnair\Projects\LBO\Clean\Ed Run and was last updated in March 2018. Most of the script is comment. The key elements are:
The stack is as follows:
- patents is an input
- Maintfeeevents2 FROM 'MaintFeeEvents_20170410b.txt' (COUNT 14708262)
- PatentExpirationRules2 FROM 'patentExpirationRules2.txt' (COUNT 27)
- Maintfeeevents2,patentexpirationrules -> patentfeebaseevents (4402215)
- FULL OUTER (patents, patentfeebaseevents) -> patentfeebasefull (5463947) -> PatentExpiryBase (5463947) -> PatentExpiry (5463947)
Note that patentfeebaseevents is made from PatentExpirationRules and not PatentExpirationRules2. Both list the same fee codes and description, but PatentExpirationRules2 has a termlength column whereas PatentExpirationRules has daysincrement and daysduration columns. daysduration is the same as termlength, except when daysduration=7300 then termlength=6205. It seems from the code that PatentExpirationRules is correct, as there is a manual fix when max(daysduration)=7300.
DROP TABLE PatentExpiry; CREATE TABLE PatentExpiry AS SELECT patentnumber, FilingDate, grantdate, maxdaysduration, nomaintenance, maxtermfix1, maxtermfix2, CASE --Old patents get: grant plus 17yrs WHEN nomaintenance = 1 THEN grantdate + 6205 -- Patents grandfathered in during transition in 1995 to new file date + 20 year term limit get: Later of issue date + 17 years or file date + 20 years used WHEN (maxtermfix1 = 1 AND maxdaysduration = 7300) THEN GREATEST(FilingDate + 7300, grantdate + 6205) --Patents after June 8 1995 but before May 28 2000 get: file date + 20 years WHEN (maxtermfix2 = 1 AND maxdaysduration = 7300) THEN FilingDate + 7300 --Patents after May 28 2000 (this might not be automatic and require an appeal though) get: later of issue date + 17 years or file date + 20 years WHEN (maxtermfix2 = 2 AND maxdaysduration = 7300) THEN GREATEST(FilingDate + 7300, grantdate + 6205) ELSE grantdate + maxdaysduration END AS expdate, CASE WHEN nomaintenance = 1 OR maxdaysduration = 7300 THEN 1::int ELSE 0::int END AS HeldToTerm FROM PatentExpiryBase; --5463947
Note: It also looks like Ed did the same fixes in: E:\mcnair\Projects\LBO\Clean\Ed Run\Building the Patent Start End Table Revised.sql
DataExtension.sql was last saved in July 2018. The output of this script is masterv1-0.txt (last produced in March 2018), which is the sole input to Datasetup.do. This section therefore traces the construction of masterv1-0.txt.
- StataStaging4 is an input
- gvkeyyearpatentdata (395213)
- statastaging4, (statastaging4->) LBOSeriesProcessed (556) -> LBODetailsWDurr (1691834)
- StataStaging4 -> NaicsSICBase (36779) & ExchangeOK (16236)
- (StataStaging4,LBODetailsWDurr,gvkeyyearpatentdata,NaicsSICBase | ExchangeOK) -> Master (746856)
gvkeyyearpatentdata appears to have been built for all of COMPUSTAT:
--######################### --# Build yearslist, GvkeyYear --######################### DROP TABLE yearslist; CREATE TABLE yearslist AS SELECT generate_series(1975,2017) as year; --43 DROP TABLE Allgvkeys; CREATE TABLE Allgvkeys AS SELECT DISTINCT gvkey FROM patenthistorybase ORDER BY gvkey; --9191 DROP TABLE GvkeyYear; CREATE TABLE GvkeyYear AS SELECT gvkey, year FROM yearslist, Allgvkeys ORDER BY gvkey, year; --395213
- patentexpiry is an input
- cspatentbuys FROM 'cspatentbuys.txt' (304864), which came from usptopad
- cspatentsales FROM 'cspatentsales.txt' (451730), which came from usptopad
- stdcompnames is an input
- assignees (is an input) -> Patentsassigneessimple 5426556
- (Patentsassigneessimple,MatchOrgnameSelf (loaded from orgnamedctmatched.txt 501232), MatchstdcompnamedctOrgname (loaded from Matchstdcompnamedctorgnamestd.txt 8361,stdcompnames) -> patentgrantgvkey
- patentgrantgvkey (1329006), cspatentbuysutlgvkey (272620) ... -> patentgrantbuy
- (patentgrantbuy,patentsale,patentexpired)-> patenthistorybase (1513045)
- patenthistorybase ...,GvkeyYear -> gvkeyyearpatentdata (395213)
Therefore needed inputs to DataExtension.sql are:
Note that Patentsassigneessimple is not in allpatentsprocessed but assignee is, and it can easy be rebuilt:
SELECT COUNT(*) FROM assignees WHERE orgname IS NOT NULL AND patentnumber IS NOT NULL; --5426556
There are two obvious master tables that we could export and reuse. The first is patenthistwoprebuys. This is patenthistorybase (1513045) but WHERE startdate < enddate. It has 1,495,486 records and the unit of observation is a patent associated with a firm. The keys are gvkey and patentnumber.
Table "public.patenthistwoprebuys" Column | Type | Collation | Nullable | Default --------------+------------------+-----------+----------+--------- gvkey | text | | | patentnumber | integer | | | startdate | date | | | startyear | double precision | | | grants | integer | | | buy | integer | | | enddate | date | | | endyear | double precision | | | sell | integer | | | term | integer | | | notrenewed | integer | | | unknownexp | integer | | |
The second master table is gvkeyyearpatentdata. This is an aggregated table, where the unit of observation is gvkey-year, where year goes from 1975-2017 (see GvkeyYear above) and contains zeros. There are 395,213 observations (i.e., one for each year for 9,191 unique GVKEYs).
Table "public.gvkeyyearpatentdata" Column | Type | Collation | Nullable | Default --------------------+---------+-----------+----------+--------- gvkey | text | | | year | integer | | | nogrants | bigint | | | nobuy | bigint | | | nosell | bigint | | | noterm | bigint | | | nonotnenewed | bigint | | | nounknownexp | bigint | | | portfoliochg | bigint | | | portfolio | numeric | | | portfoliocitations | bigint | | | citesperpatent | numeric | | |
So the dataset is built from StataStaging4, as well as raw tables.
Variables used include:
- gvkey, year
- naics, sic, exchg
- datadate, fyear, tic, cusip, name_compustat, fyr, act, artfs, at, ceq, che, csho, dlc, dltt, dt, dvc, dvp, ebitda, emp, esubc, ibc, intan, lct, lt, ni, oibdp, opiti, re, revt, sale, txditc, txt, xidoc, xint, xrd, xsga, xstfws,
- exchg, prcc_f, addzip, city, loc
It therefore seems that statastaging4 is all of COMPUSTAT with some LBO data included... However, I can't find the code that actually built it. There is code to build StataStaging5 in
E:\mcnair\Projects\LBO\Clean\Ed Run\LBO Innovation SQL Script.sql
And the best instructions seem to be to run parts 1, 2, and 3A of that script.
Files are in E:\McNair\Projects\LBO\Clean\Ed Run
The main STATA scripts:
The AllResults.xlxs file is the current results set.
Notes from the call:
- We have 92 LBOs that get matched.
- Have one or more patents in your lifetime (for both LBOs and non-LBOs)
- We are matching on a logit without fe. Can match using indu3. Can't use indu5 or indu6.
- Matches require same indu3, 20pc of the patentstock year0, 5yr window (-2 to +2)
Ed has to fix matches so that they are on patentstock in year -1.
Once Ed has the FINAL dataset built, he can go back and build the following patent statistics:
- Claims made
- Citations made
Also consider (from the Manso paper):
- Exploratory patents (within-firm vs. not paper)
Find out what Lerner did for before vs. after and mimic
Data Processing notes
- In dbase allpatentsprocessed on 181. Local files in Z:\LBO\Clean
- Run LBO Innovation SQL Script.sql sections 1,2, and 3A
- Run Building the Patent Start End Table Revised.sql
- Run LBO Innovation SQL Script.sql sections 3B, 3C, and 4. This produces statastaging5.txt.
- In STATA
- Run STATAdatasetup.do. It takes statastaging5.txt and produces STATAdataset5wvars.dta
- Run STATApredictLBOclean.do. It takes STATAdataset5wvars.dta and produces STATApredictLBOclean.dta and STATApredictLBOclean5.txt
- In Julia
- Run LBOmatchscript.jl. It takes STATApredictLBOclean5.txt and produces matchresults.txt
- In STATA
- Run PreMergePrep.do. It tales matchresults.txt and produces MatchResults.dta
- Run Merge.do. It takes STATApredictLBOclean.dta and merges it with MatchResults.dta. It then does the analysis.
See also: James New Run Documentation.txt This provides an overall summary, though some of the details are a little off.
Adding Patent assignment data
The patent assignement data is loaded into usptopad on the rdp (not the dbase server). The load script and processing script are in:
E:\McNair\Projects\USPTO Patent Assignment Dataset
The process script (ProcessAssignments.sql) was modified for this project:
- Create list of distinct or_name (from assignors) for inter firm assignments
- Create list of distinct ee_name (from assignees) for inter firm assignments
Build the patent data:
James' rough notes are in E:\McNair\Projects\LBO\Working Notes. These are date ordered and more for James than anyone else.
Files are generally in
Separate code to build the patent stocks base type (start date, expiry date) is in:
Building the Patent Start End Table.sql
Note that this code is out of date and needs to be replaced.
The dbase is allpatentsprocessed
LBO Innovation SQL Script.sql is the sql script to process the data
- Has comments on matcher use
- Note - Can't run this file!
- Needs dramatic clean up
- Patent expiration needs recalculating. See Patent Expiration Rules
- Bbase txt files (in and out) in Z:\LBO\Clean
- statastaging4.txt (current final dataset) is exported from allpatentsprocessed
To work with the data in STATA:
- Run STATAdatasetup4.do
- Takes statastaging4.txt (currently as dta)
- Makes that statastaging4.dta
- Run STATApredictLBOclean.do
- Takes statastaging4.dta
- Makes STATApredictLBOcleanb.dta (which has a bug fix over STATApredictLBOclean.dta)
- Also exports STATApredictLBOclean.txt, which contains just the variables relevant for the matching
The matching scripts (in Julia - run as admin to access modules) are in folder
Note that the instructions below are for the old version. James has now fixed the matching script to export tab-delimited text of matches that can be merged back into a STATA dataset, so the diff-in-diff can be done in STATA.
The documentation for the new version can be found at Matching_LBOs_(Julia)
The old version instructions are:
- From terminal in E:\McNair\Projects\LBO\New matching
- Launch Julia: julia
- Run script: include("LBODID.jl")
The old files are
LBODID.jl Loads data from E:\McNair\Projects\LBO\Clean\STATApredictLBOclean.txt Adds col, removes dead cases, breaks data into LBO & non-LBO Line 55: mscore = :logitpw; Lines 89-96: Define criteria for what constitutes a match - store as string evaluated for truth Line 101: include("LBOmatching.jl") #Loading in user-defined methods for matching Line 104: Do the matching! Beyond line 105 are doing the diff-in-diff
This is a clean reboot of the LBO project with clearer documentation. The purpose of the project is to analyze the innovative activity of firms after they undergo LBOs. For counterfactuals in the diff-in-diff framework, we match LBO firms to non-LBO using decade, industry, patenting history and propensity score.
See the Construction of Dataset section for the steps taken in SQL to build the dataset.
- LBO_list_and_data.xlsx: Dataset from Ayash and Schuett (2016) (Hand-cleaned list of LBOs from SDC)
- first sheet = main dataset
- second sheet = raw SDC data, dealnumb to GVkey linkage (all public LBOS, including divisional buyouts)
- third sheet = COMPUSTAT lookup (unused)
- fourth sheet = Private-to-private transactions
- failedLBOs2.xlsx: list of withdrawn LBOs from SDC
- first sheet = data
- second sheet = query specifications
- compustatnames.txt: List of distinct company names from COMPUSTAT pull. Used to generate stdcompnames.txt
- LBO COMPUSTAT PULL 4.txt: Result of COMPUSTAT pull
- LBOdata_sheetX.txt: See LBO_list_and_data.xlsx in E:\McNair\Projects\LBO
- LBOdata_sheet1.txt: First sheet of LBO_list_and_data.xlsx as tab-delimited txt (NOTE: we might have included entries marked as excluded; this is dealt with in SQL code)
- LBOdata_sheet2.txt: Second sheet of LBO_list_and_data.xlsx as tab-delimited txt
- LBOdata_sheet3.txt: Third sheet of LBO_list_and_data.xlsx as tab-delimited txt
- LBOdata_sheet4.txt: Fourth sheet of LBO_list_and_data.xlsx as tab-delimited txt
- stdcompnames.txt: Standardized COMPUSTAT company names (and non-standardized name lookup, gvkey). Used for matching to LBO and patent data.
- matchorgname.txt: Cross-ref between standardized assignee names and raw assignee names
- assigneexgvkey.txt: Name-based match between COMPUSTAT gvkey and raw assignee names
- LBOmatchednames.txt: Name-based match between standardized LBO names and COMPUSTAT names; used to include additional matches of LBOs to COMPUSTAT companies
In allpatentsprocessed database:
- lbocompustat4: See LBO COMPUSTAT PULL 4.txt
- compustatnames: See compustatnames.txt
- stdcompnames: See stdcompnames.txt
- LBOdata: First sheet of Brian's data; main dataset
- LBOsheet2: Second sheet of Brian's data; used to construct gvkey-to-dealnumb link
- yearsbygvkeys: Composite primary key (all years 1970 to 2015 cross-producted with unique gvkeys); all other data is joined onto this
- gvkeyxdealnumbfinal: Final gvkey to dealnumb lookup table. Includes results of name-based matching.
- yearsbygvkeysfilled2: Compustat dataset with LBO events flagged
Construction of Dataset (SQL)
See documentation in
E:\McNair\Projects\LBO\Clean\LBO Innovation SQL Script.txt
NOTE: All following work is done in the postgresql database "allpatentsprocessed"
Basic outline is as follows:
- Imported COMPUSTAT Data
- Built composite primary key from unique gvkeys cartesian-producted with years 1970 to 2015 (all other datasets will be joined onto this)
- This is yearsbygvkeys in SQL database
- Stripped out list of unique company names from COMPUSTAT data
- Used matcher to construct standardized company names, with lookup table to original names
- Joined gvkeys back into lookup table, result exported as stdcompnames.txt
- Built composite primary key from unique gvkeys cartesian-producted with years 1970 to 2015 (all other datasets will be joined onto this)
- Imported Brian's hand-cleaned SDC data on LBOs
- Sheet 1 contains main list of LBOs
- Sheet 2 contains gvkey to dealnumb link
- Sheet 3 contains COMPUSTAT output (unused)
- Sheet 4 contains private-to-private LBOs
- Used SDC data to flag LBO events for COMPUSTAT data
- Goal is to create gvkey lookup for each dealnumb, then query for number of deals associated with each gvkey in each year
- First used appropriate columns from sheet 2 to make preliminary lookup table
- Note: This left many LBOs unmatched. Applied name-based matching to include more deals (esp. all private-to-private)
- Copied dealnumb and company name from LBO_List_and_data.xlsx into textpad
- Manually removed extraneous symbols and words, saved result as LBO_Cleaned_data.xlsx
- Performed matching by company name with stdcompnames.txt (which included gvkeys) to get alternative gvkey-dealnumb lookup table
- Appended non-contradictory results to previous gvkey-dealnumb lookup table (to make SQL table gvkeyxdealnumbfinal)
- Used this table to flag LBO events in Compustat dataset for appropriate gvkeys in appropriate year(s)
- Result is SQL table yearsbygvkeysfilled2
- Used USPTO patents database to tag yearly patent stocks and flows for all companies in COMPUSTAT
- Matched the list of standardized patent assignee names with list of standardized COMPUSTAT company names (links gvkey to patent assignee names); result is cross-ref table assigneexgvkey
- Used table patentstartend (which has start and end dates of when patent was active; see subsection below) and assigneexgvkey to determine, for each gvkey-year combination, the number of active patents
- Result is SQL table yearsbygvkeysfilled3
- Final dataset is exported as tab-delimited file STATAstaging4.txt
Patent start-end table
To construct the table of all patents and their grant dates and expiration dates (i.e., start-end of being an active patent), we did the following:
(code/detailed documentation is in
E:\mcnair\projects\LBO\Clean\Building the Patent Start End Table.txt)
- Imported all patent maintenance fee events from the file MaintFeeEvents_20160613.txt
- Imported lookup table patentExpirationRules.txt containing patent lifetime implied by each maintenance event
- For each patent, found the latest expiration date implied by any recorded maintenance event.
(Query saved in WRDS as LBO COMPUSTAT PULL 4)
Date Range: 1970-01 to 2016-10 Selection: Entire Database Output: Tab-delimited Date Format: YYMMDDs10 Saved in: Z:\LBO\Clean\LBO COMPUSTAT PULL 4.txt
Company Name Ticker Symbol CUSIP Stock Exchange Code Fiscal Year-End ADDZIP -- Postal Code CITY -- City IPODATE -- Company Initial Public Offering Date LOC -- Current ISO Country Code - Headquarters NAICS -- North American Industry Classification Code SIC -- Standard Industry Classification Code STATE -- State/Province FYEAR -- Data Year - Fiscal ACT -- Current Assets - Total ARTFS -- Accounts Receivable/Debtors - Total AT -- Assets - Total CEQ -- Common/Ordinary Equity - Total CHE -- Cash and Short-Term Investments DLC -- Debt in Current Liabilities - Total DLTT -- Long-Term Debt - Total INTAN -- Intangible Assets - Total LCT -- Current Liabilities - Total LT -- Liabilities - Total RE -- Retained Earnings TXDITC -- Deferred Taxes and Investment Tax Credit DVC -- Dividends Common/Ordinary DVP -- Dividends - Preferred/Preference EBITDA -- Earnings Before Interest NI -- Net Income (Loss) OIBDP -- Operating Income Before Depreciation OPITI -- Operating Income - Total REVT -- Revenue - Total SALE -- Sales/Turnover (Net) TXT -- Income Taxes - Total XINT -- Interest and Related Expense - Total XRD -- Research and Development Expense XSGA -- Selling, General and Administrative Expense XSTFWS -- Staff Expense - Wages and Salaries ESUBC -- Equity in Net Loss Earnings IBC -- Income Before Extraordinary Items (Cash Flow) XIDOC -- Extraordinary Items and Discontinued Operations (Cash Flow) CSHO -- Common Shares Outstanding DT -- Total Debt Including Current EMP -- Employees PRCC_F -- Price Close - Annual - Fiscal
There are three distinct pulls from SDC. Session details should be included as the second sheet of the respective Excel worksheets.
- Brian Ayash's hand-cleaned list of LBOs (with gvkeys). This exists as the list_and_data.xlsx table.
- List of failed LBOs. This exists as the failedLBOs2.xlsx table.
This academic paper investigates the recent trend of leveraged buyouts (LBOs) in the US financial market and assesses their impact on innovative activities within firms that undergo LBOs. To do this, we match firms that undergo LBOs with similar firms that do not undergo LBOs using a dynamic hazard rate model. Using these matched firms as synthetic controls, we perform a difference in difference analysis to compare the trajectory of innovative activity of LBO firms with that of non-LBO firms (e.g., patenting, intangible assets).
See Leverage Buyout Innovation (Clean) for reboot of this project with clearer documentation.
How to measure innovation/effect on innovation
- Grants - Flow vs. stock
- Patents sold (re-assignments)
- Patents securitized?
- Accounting variables:
- R&D spending (spotty coverage, Brian's data and COMPUSTAT)
- Intangible assets (Brian's data and COMPUSTAT)
Factors influencing LBO impact
- Industry (note - depends on distribution)
- High tech?
- Public to private (main focus, ~330obs)
- Private to private (23 in final STATA set for analysis)
- Only 28 of ~60 private deals in SQL have patents
- Time period
- 1980s, 90s, 00s, and teens (currently teens in with 00s)
Note: Buyout type (MBO/LBO), Exit type (IPO vs. Secondary vs. Strategic Buyer, Financial Buyer, Bankruptcy, Still Private, Out of Court Rest.), Innovator vs. Flipper (how do we decide?) and other categorizations might be useful?
See LBO Data Description for detailed work.
E:\McNair\Projects\LBO\LBO academic paper script pre compustat.txt
E:\McNair\Projects\LBO\LBO academic paper script post compustat.txt
See LBO Data Description for detailed work.
- Script for generating data labels, compound variables and lagged variables, then setting dataset as panel survival analysis data
- Script for computing summary statistics and regressions (eventually)
LBO Data and Patent Data
- Get original LBO Data, clean it, and match it against itself
- Get patent data, draw the distinct assignees, and match it against itself. Note - Redo this step with new patent data
- Match the LBO data to the patent data
- Pull patent numbers and grant dates and join those variables to matched LBO patent data
- Final table includes standardized names, number of patents, minimum grant years, maximum grant years, average grant years, and whether the LBO was public to private or private to private
Introduction of Compustat and Preparation for Stata
- Pull Compustat data, match Compustat names against themselves to standardize
- Connect Compustat data to LBO Data using the GVKeys in the SDC sheet of the original LBO excel file
- Connect Compustat data to LBO data and Patent data in a single table where each company has 47 rows of data, one for each year 1970 - 2016
- Final table includes standardized names, GVKeys, years, number of patents per year, LBO entries/exits in each year, and all pulled Compustat variables
- Generate variables, compound and lagged
- Set data up as panel
- Generate hazard model
- Conduct statistical analysis