Changes

VentureXpert Data (view source)

Revision as of 12:59, 14 July 2018

3,682 bytes added , 12:59, 14 July 2018

no edit summary

The third error is in an area code where 1-8 is written. This hyphen causes errors. Interestingly, the line number given by PuTTY was correct, and I found it in my text file and deleted it manually.

These were the only errors I encountered while loading this table.

==Instructions on Matching PortCos to Issuers and M&As From Ed==

Get portco keys

DROP TABLE portcokeys;

CREATE TABLE portcokey AS

SELECT coname, statecode, datefirst

FROM portcocore;

--CHECK COUNT IS SAME AS portcocore OR THESE KEYS ARE VALID AND FIX THAT FIRST

Get distinct coname and put it in a file

\COPY (SELECT DISTINCT coname FROM portcokeys) TO 'DistinctConame.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV

Match that to itself

Move DistinctConame.txt to E:\McNair\Software\Scripts\Matcher\Input

Open powershell and change directory to E:\McNair\Software\Scripts\Matcher

Run the matcher in mode2:

perl Matcher.pl -file1="DistinctConame.txt" -file2="DistinctConame.txt" -mode=2

Pick up the output file from E:\McNair\Software\Scripts\Matcher\Output (it is probably called DistinctConame.txt-DistinctConame.txt.matched) and move it to your Z drive directory

Load the matches into the dbase

DROP TABLE PortcoStd;

CREATE TABLE PortcoStd (

conamestd varchar(255),

coname varchar(255),

norm varchar(100),

x1 varchar(255),

x2 varchar(255)

);

\COPY CohortCoStd FROM 'DistinctConame.txt-DistinctConame.txt.matched' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV

--YOUR COUNT

Join the Conamestd back to the portcokeys table to create your matching table

DROP TABLE portcokeysstd;

CREATE TABLE portcokeysstd AS

SELECT B.conamestd, A.*

FROM portcokey AS A

JOIN PortcoStd AS B ON A.coname=B.coname

--CHECK COUNT IS SAME AS portcokey OR YOU LOST SOME NAMES OR INFLATED THE DATA

Put that in a file for matching (conamestd is in first column by construction)

\COPY portcokeysstd TO 'PortCoMatchInput.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV

--YOUR COUNT

Now prepare to repeat that process for M&A's and IPOs:

*For M&As your keys (for now) will be targetname, statecode, dateannounced

*For IPOs your keys (for now) will be issuername, statecode, issuedate

*FIRST CLEAN EACH DATASET. The easiest way to remove duplicates (if you have lots of them) is to use an aggregate query:

DROP TABLE IPOCoreNoDups;

CREATE TABLE IPOCoreNoDups as

SELECT issuername, statecode, issuedate, max(var1) as var1, avg(var2) as var2, ...

FROM IPOCore GROUP BY issuername, statecode, issuedate ORDER BY issuername, statecode, issuedate;

Note that you need all vars to be inside aggregates and that you should choose the aggregate function sensibly by looking at the data. Generally use MAX for amounts and MIN for dates. You can also use MAX or MIN on text strings.

And now build the same stacks as before but to create Issuerkeystd and TargetKeystd (or whatever you call them). Make sure that issuerstd (and targetnamestd) is in the first column.

Now match Portcokeystd to Issuerkeystd, and match Portcokeystd to Targetkeystd

*Move the files into the input director as before

*Run the matcher script but WITHOUT mode 2:

perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="IssuerMatchInput.txt"

perl Matcher.pl -file1="PortCoMatchInput.txt" -file2="TargetMatchInput.txt"

Open each of these files in excel and mark good matches with 1s and bad matches with 0s by adding columns to compare dates, states, etc, and filtering.

When you are done:

*Build a new sheet of just good matches.

*Save the excel files

*Copy each of your match sheets to a text file

*CREATE TABLE to reflect the data you are going to load (include std names and keys)

*\COPY the data (using the exact copy command above but changing the table and file names) into the table

*Celebrate!

*Next we'll deal with any firms that have an IPO and an M&A and decide which we'll keep

*And then we'll join in the chosen IPO and M&A data and move on!

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,613

edits

Changes

VentureXpert Data (view source)

Revision as of 12:59, 14 July 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools