Difference between revisions of "VentureXpert Data"
Adliebster (talk | contribs) |
Adliebster (talk | contribs) |
||
Line 14: | Line 14: | ||
==Location== | ==Location== | ||
− | My | + | My scripts for SDC pulls are located in the Z drive in the location: |
− | Z:\VentureXpertDB | + | Z:\VentureXpertDB\ScriptsForSDCExtract |
− | + | My successfully pulled and normalized files are stored in the location: | |
+ | Z:\VentureXpertDB\ExtractedDataQ2 | ||
− | + | My script for loading data is in one big text file in the location: | |
− | Z:\ | + | Z:\VentureXpertDB\vcdb3 |
− | + | ||
− | + | The folder vcdb2 is there for reference to see what people before had done. ExtractedData is there because I pulled data before July 1st, and Ed asked me to repull the data. | |
− | |||
− | |||
− | |||
Revision as of 17:14, 10 July 2018
VentureXpert Data | |
---|---|
Project Information | |
Project Title | VentureXpert Data |
Owner | Augi Liebster |
Start Date | June 20, 2018 |
Deadline | |
Primary Billing | |
Notes | |
Has project status | Active |
Copyright © 2016 edegan.com. All Rights Reserved. |
Contents
Relevant Former Projects
Location
My scripts for SDC pulls are located in the Z drive in the location:
Z:\VentureXpertDB\ScriptsForSDCExtract
My successfully pulled and normalized files are stored in the location:
Z:\VentureXpertDB\ExtractedDataQ2
My script for loading data is in one big text file in the location:
Z:\VentureXpertDB\vcdb3
The folder vcdb2 is there for reference to see what people before had done. ExtractedData is there because I pulled data before July 1st, and Ed asked me to repull the data.
Goal
I will be looking to redesign the VC Database in a way that is more intuitively built than the previous one. I will also update the database with current data.
Initial Stages
The first step of the project was to figure out what primary keys to use for each major table that I create. I looked at the primary keys used in the creation of the VC Database Rebuild and found primary keys that are decent. I have updated them and list them below:
- CompanyBaseCore- coname, statecode, datefirstinv
- IPOCore- issuer, issuedate, statecode
- MACore- target name, target state code, announceddate
- Geo - city, statecode, coname, datefirst, year
- DeadDate - conname, statecode, datefirst, rounddate (tentative could still change)
- RoundCore- conname, statecode, datefirst, rounddate
- FirmBaseCore - firmname
- FundBaseCore - fund name (firstinvedate doesn't work because not every row has an entry)
These are my initial listings and I will come back to update them if needed.
The second part of the initial stage has been to pull data from the SDC Platinum platform. I did it in July to ensure that I had two full quarters of data.
SDC Pull
When pulling data from SDC, it is a good idea to look for previously made rpt files that have the names of the pulls you will need to do. They have already been created and will save you a lot of work. The rpt files that I used are in the folder VentureXpertDB/ScriptsForSDCExtract. The files will come in pairs with one being saved as an ssh file and one as a rpt file. To update the dates to make them recent, go into the ssh file of the pair and change the date of last investment. When you open SDC, you will be given a variety of choices for which database to pull from. For each type of file chose the following:
- VentureXpert - PortCo, PortCoLong, USVC, Firms, BranchOffices, Funds, Rounds, VCFirmLong
- Mergres & Acquisition - MAs
- Global New Issues Databases
Help on pulling data from SDC is on the SDC Platinum (Wiki) page.
VCFund Pull Problem
When pulling the VCFund1980-Present, I encountered two problems. One, is that SDC is not able to sort through the funds that are US only with the built in filters. Two, there are multiple rpt files that specify different variables for the fund pull. I pulled from both to be safe, but in the VC Database Rebuild page there is a section on the fund pull where Ed specifies which rpt file he used to pull data from SDC. Regardless I have both saved in the ExtractedData folder. After speaking with Ed, he told me to use the VCFund1980-present.rpt file to extract the data. Had various problems extracting data including freezing of SDC program or getting error Out of Memory. Check the SDC Platinum (Wiki) page to fix these issues.
Loading Tables
Begin with loading the scripts that I have written and placed in the vcdb3 file. Copy and paste the code from the text files into PuTTY. When loading the roundbase folder, I ran into the problem of one entry Cardtronic Technology having an word in the description that looked like this: "smart'. I just manually removed the quotations and was good to go.