Changes

VentureXpert Data (view source)

Revision as of 11:21, 7 August 2018

1,440 bytes added , 11:21, 7 August 2018

→‎Instructions on Matching PortCos to Issuers and M&As From Ed

==Instructions on Matching PortCos to Issuers and M&As From Ed==

===Company Standardizing===

Get portco keys

~~Now prepare to repeat that process~~ ===MA Cleaning and Matching===First remove all of the duplicates in the MA data. Do this by running aggregate queries on every column except for ~~M&A's and IPOs~~the primary key:*For M&As your keys DROP TABLE MANoDups; CREATE TABLE MANoDups AS SELECT targetname, targetstate, announceddate, min(effectivedate) AS effectivedate, MIN(~~for now~~acquirorname) ~~will be targetname~~as acquirorname, ~~statecode~~MIN(acquirorstate) as acquirorstate, ~~dateannounced~~ MAX(transactionamt) as *For IPOs your keys transactionamt, MAX(~~for now~~enterpriseval) as enterpriseval, MIN(acquirorstatus) ~~will be issuername~~as acquirorstatus FROM mas GROUP BY targetname, ~~statecode~~targetstate, ~~issuedate~~announceddate ORDER BY targetname, targetstate, announceddate;*FIRST CLEAN EACH DATASET. The easiest way to remove duplicates (if you have lots of them) is to use an aggregate query: --119374

SELECT COUNT(*) FROM(SELECT DISTINCT targetname, targetstate, announceddate FROM manodups)a; --119374 Since these counts are equivalent, the data set is clean. Then get all the primary keys from the table and copy the distinct target names into a text file. DROP TABLE makey; CREATE TABLE makey AS SELECT targetname, targetstate, announceddate FROM manodups; --119374 \COPY (SELECT DISTINCT targetname FROM makey) TO 'DistinctTargetName.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV; --117212 After running this list of distinct target names through the matcher, put the standardized MA list into the data base. DROP TABLE MaStd; CREATE TABLE MaStd ( targetnamestd varchar(255), targetname varchar(255), norm varchar(100), x1 varchar(255), x2 varchar(255) ); \COPY mastd FROM 'DistinctTargetName.txt-DistinctTargetName.txt.matched' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --117212 Then match the list of standardized names back to the makey table to get a table with standardized keys and primary keys. This will be your input for matching against port cos. DROP TABLE ~~IPOCoreNoDups~~makeysstd; CREATE TABLE ~~IPOCoreNoDups as~~makeysstd AS SELECT B.targetnamestd, A.* FROM makey AS A JOIN mastd AS B ON A.targetname=B.targetname; --119374 \COPY makeysstd TO 'MAMatchInput.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --119374 ===IPO Cleaning and Matching===The process is the same for IPOs. DROP TABLE iponodups; CREATE TABLE iponodups AS SELECT ~~issuername~~issuer, statecode, issuedate, ~~max~~MAX(~~var1~~principalamt) AS principalamt, MAX(proceedsamt) AS proceedsamt, MIN(naiccode) as ~~var1~~naicode, MIN(zipcode) AS zipcode, ~~avg~~MIN(~~var2~~status) ~~as var2~~AS status, ~~...~~ MIN(foundeddate) AS foundeddate FROM ~~IPOCore~~ ipos GROUP BY ~~issuername~~issuer, statecode, issuedate ORDER BY ~~issuername~~issuer, statecode, issuedate; --11149 SELECT COUNT(*) FROM(SELECT DISTINCT issuer, statecode, issuedate FROM iponodups)a; Note that you need all vars to be inside aggregates and that you should choose the aggregate function sensibly by looking at the data. Generally use MAX for amounts and MIN for dates. You can also use MAX or MIN on text strings.--11149

~~And now build the same stacks as before but to create Issuerkeystd and TargetKeystd (or whatever you call them). Make sure that issuerstd (and targetnamestd) is in the first column.~~ DROP TABLE ipokeys; CREATE TABLE ipokeys AS SELECT issuer, statecode, issuedate FROM iponodups; --11149

~~Now match Portcokeystd to Issuerkeystd, and match Portcokeystd to Targetkeystd~~ \COPY (SELECT DISTINCT issuer FROM ipokeys) TO 'IPODistinctIssuer.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV*Move the files into the input director as before*Run the matcher script but WITHOUT mode 2: --10803

~~perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="IssuerMatchInput.txt"~~DROP TABLE ipokeysstd; ~~perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="TargetMatchInput.txt"~~CREATE TABLE ipokeysstd ( issuerstd varchar(255), issuer varchar(255), norm varchar(100), x1 varchar(255), x2 varchar(255) );

~~Open each of these files in excel and mark good matches with 1s and bad matches with 0s by adding columns to compare dates~~ \COPY ipokeysstd FROM 'IPODistinctIssuer.txt-IPODistinctIssuer.txt.matched' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --10803 DROP TABLE ipostd; CREATE TABLE ipostd AS SELECT B.issuerstd, ~~states, etc, and filtering~~A.* FROM ipokeys AS A JOIN ipokeysstd AS B ON A.issuer=B.issuer; --11149 \COPY ipostd TO 'IPOMatchInput.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --11149

~~When you are done:~~ *Build a new sheet of just good matchesI generally use MAX for amounts and MIN for dates.*Save the excel files*Copy each of your match sheets I also chose to a use MIN on text ~~file~~*CREATE TABLE to reflect the data you are going to load (include std names and keys)*\COPY the data (using the exact copy command above but changing the table and file names) into the table*Celebrate!*Next we'll deal with any firms that have an IPO and an M&A and decide which we'll keep*And then we'll join in the chosen IPO and M&A data and move on!strings.

==Cleaning IPO and MA Data==

Adliebster

158

edits

Changes

VentureXpert Data (view source)

Revision as of 11:21, 7 August 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools