Difference between revisions of "VentureXpert Data"

From edegan.com
Jump to navigation Jump to search
Line 51: Line 51:
 
#VentureXpert - PortCo, PortCoLong, USVC, Firms, BranchOffices, Funds, Rounds, VCFirmLong
 
#VentureXpert - PortCo, PortCoLong, USVC, Firms, BranchOffices, Funds, Rounds, VCFirmLong
 
#Mergres & Acquisition - MAs
 
#Mergres & Acquisition - MAs
#Global New Issues Databases
+
#Global New Issues Databases - IPOs
  
 
Help on pulling data from SDC is on the [[SDC Platinum (Wiki)]] page.  
 
Help on pulling data from SDC is on the [[SDC Platinum (Wiki)]] page.  

Revision as of 15:35, 11 July 2018


McNair Project
VentureXpert Data
Project logo 02.png
Project Information
Project Title VentureXpert Data
Owner Augi Liebster
Start Date June 20, 2018
Deadline
Primary Billing
Notes
Has project status Active
Copyright © 2016 edegan.com. All Rights Reserved.


Augi Liebster (Work Log)

Relevant Former Projects

  1. Venture Capital (Data)
  2. Retrieving US VC Data From SDC
  3. VC Database Rebuild

Location

My scripts for SDC pulls are located in the Z drive in the location:

Z:\VentureXpertDB\ScriptsForSDCExtract

My successfully pulled and normalized files are stored in the location:

Z:\VentureXpertDB\ExtractedDataQ2

My script for loading data is in one big text file in the location:

Z:\VentureXpertDB\vcdb3

The folder vcdb2 is there for reference to see what people before had done. ExtractedData is there because I pulled data before July 1st, and Ed asked me to repull the data.


Goal

I will be looking to redesign the VC Database in a way that is more intuitively built than the previous one. I will also update the database with current data.

Initial Stages

The first step of the project was to figure out what primary keys to use for each major table that I create. I looked at the primary keys used in the creation of the VC Database Rebuild and found primary keys that are decent. I have updated them and list them below:

  1. CompanyBaseCore- coname, statecode, datefirstinv
  2. IPOCore- issuer, issuedate, statecode
  3. MACore- target name, target state code, announceddate
  4. Geo - city, statecode, coname, datefirst, year
  5. DeadDate - conname, statecode, datefirst, rounddate (tentative could still change)
  6. RoundCore- conname, statecode, datefirst, rounddate
  7. FirmBaseCore - firmname
  8. FundBaseCore - fund name (firstinvedate doesn't work because not every row has an entry)

These are my initial listings and I will come back to update them if needed.

The second part of the initial stage has been to pull data from the SDC Platinum platform. I did it in July to ensure that I had two full quarters of data.


SDC Pull

When pulling data from SDC, it is a good idea to look for previously made rpt files that have the names of the pulls you will need to do. They have already been created and will save you a lot of work. The rpt files that I used are in the folder VentureXpertDB/ScriptsForSDCExtract. The files will come in pairs with one being saved as an ssh file and one as a rpt file. To update the dates to make them recent, go into the ssh file of the pair and change the date of last investment. When you open SDC, you will be given a variety of choices for which database to pull from. For each type of file chose the following:

  1. VentureXpert - PortCo, PortCoLong, USVC, Firms, BranchOffices, Funds, Rounds, VCFirmLong
  2. Mergres & Acquisition - MAs
  3. Global New Issues Databases - IPOs

Help on pulling data from SDC is on the SDC Platinum (Wiki) page.

VCFund Pull Problem

When pulling the VCFund1980-Present, I encountered two problems. One, is that SDC is not able to sort through the funds that are US only with the built in filters. Two, there are multiple rpt files that specify different variables for the fund pull. I pulled from both to be safe, but in the VC Database Rebuild page there is a section on the fund pull where Ed specifies which rpt file he used to pull data from SDC. Regardless I have both saved in the ExtractedData folder. After speaking with Ed, he told me to use the VCFund1980-present.rpt file to extract the data. Had various problems extracting data including freezing of SDC program or getting error Out of Memory. Check the SDC Platinum (Wiki) page to fix these issues.


Loading Tables

Begin with loading the scripts that I have written and placed in the vcdb3 file. Copy and paste the code from the text files into PuTTY. When loading the roundbase folder, I ran into the problem of one entry Cardtronic Technology having an word in the description that looked like this: "smart'. I just manually removed the quotations and was good to go.