Changes

Jump to navigation Jump to search
1,440 bytes added ,  11:21, 7 August 2018
==Instructions on Matching PortCos to Issuers and M&As From Ed==
===Company Standardizing===
Get portco keys
Now prepare to repeat that process ===MA Cleaning and Matching===First remove all of the duplicates in the MA data. Do this by running aggregate queries on every column except for M&A's and IPOsthe primary key:*For M&As your keys DROP TABLE MANoDups; CREATE TABLE MANoDups AS SELECT targetname, targetstate, announceddate, min(effectivedate) AS effectivedate, MIN(for nowacquirorname) will be targetnameas acquirorname, statecodeMIN(acquirorstate) as acquirorstate, dateannounced MAX(transactionamt) as *For IPOs your keys transactionamt, MAX(for nowenterpriseval) as enterpriseval, MIN(acquirorstatus) will be issuernameas acquirorstatus FROM mas GROUP BY targetname, statecodetargetstate, issuedateannounceddate ORDER BY targetname, targetstate, announceddate;*FIRST CLEAN EACH DATASET. The easiest way to remove duplicates (if you have lots of them) is to use an aggregate query: --119374
SELECT COUNT(*) FROM(SELECT DISTINCT targetname, targetstate, announceddate FROM manodups)a; --119374 Since these counts are equivalent, the data set is clean. Then get all the primary keys from the table and copy the distinct target names into a text file.  DROP TABLE makey; CREATE TABLE makey AS SELECT targetname, targetstate, announceddate FROM manodups; --119374  \COPY (SELECT DISTINCT targetname FROM makey) TO 'DistinctTargetName.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV; --117212 After running this list of distinct target names through the matcher, put the standardized MA list into the data base.  DROP TABLE MaStd; CREATE TABLE MaStd ( targetnamestd varchar(255), targetname varchar(255), norm varchar(100), x1 varchar(255), x2 varchar(255) );  \COPY mastd FROM 'DistinctTargetName.txt-DistinctTargetName.txt.matched' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --117212 Then match the list of standardized names back to the makey table to get a table with standardized keys and primary keys. This will be your input for matching against port cos.   DROP TABLE IPOCoreNoDupsmakeysstd; CREATE TABLE IPOCoreNoDups asmakeysstd AS SELECT B.targetnamestd, A.* FROM makey AS A JOIN mastd AS B ON A.targetname=B.targetname; --119374  \COPY makeysstd TO 'MAMatchInput.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --119374 ===IPO Cleaning and Matching===The process is the same for IPOs.  DROP TABLE iponodups; CREATE TABLE iponodups AS SELECT issuernameissuer, statecode, issuedate, maxMAX(var1principalamt) AS principalamt, MAX(proceedsamt) AS proceedsamt, MIN(naiccode) as var1naicode, MIN(zipcode) AS zipcode, avgMIN(var2status) as var2AS status, ... MIN(foundeddate) AS foundeddate FROM IPOCore ipos GROUP BY issuernameissuer, statecode, issuedate ORDER BY issuernameissuer, statecode, issuedate; --11149  SELECT COUNT(*) FROM(SELECT DISTINCT issuer, statecode, issuedate FROM iponodups)a; Note that you need all vars to be inside aggregates and that you should choose the aggregate function sensibly by looking at the data. Generally use MAX for amounts and MIN for dates. You can also use MAX or MIN on text strings.--11149
And now build the same stacks as before but to create Issuerkeystd and TargetKeystd (or whatever you call them). Make sure that issuerstd (and targetnamestd) is in the first column. DROP TABLE ipokeys; CREATE TABLE ipokeys AS SELECT issuer, statecode, issuedate FROM iponodups; --11149
Now match Portcokeystd to Issuerkeystd, and match Portcokeystd to Targetkeystd \COPY (SELECT DISTINCT issuer FROM ipokeys) TO 'IPODistinctIssuer.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV*Move the files into the input director as before*Run the matcher script but WITHOUT mode 2: --10803
perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="IssuerMatchInput.txt"DROP TABLE ipokeysstd; perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="TargetMatchInput.txt"CREATE TABLE ipokeysstd ( issuerstd varchar(255), issuer varchar(255), norm varchar(100), x1 varchar(255), x2 varchar(255) );
Open each of these files in excel and mark good matches with 1s and bad matches with 0s by adding columns to compare dates \COPY ipokeysstd FROM 'IPODistinctIssuer.txt-IPODistinctIssuer.txt.matched' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --10803  DROP TABLE ipostd; CREATE TABLE ipostd AS SELECT B.issuerstd, states, etc, and filteringA.* FROM ipokeys AS A JOIN ipokeysstd AS B ON A.issuer=B.issuer; --11149  \COPY ipostd TO 'IPOMatchInput.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --11149
When you are done: *Build a new sheet of just good matchesI generally use MAX for amounts and MIN for dates.*Save the excel files*Copy each of your match sheets I also chose to a use MIN on text file*CREATE TABLE to reflect the data you are going to load (include std names and keys)*\COPY the data (using the exact copy command above but changing the table and file names) into the table*Celebrate!*Next we'll deal with any firms that have an IPO and an M&A and decide which we'll keep*And then we'll join in the chosen IPO and M&A data and move on!strings.
==Cleaning IPO and MA Data==
158

edits

Navigation menu