Difference between revisions of "Contruction of the Cultural Homogeneity in VC Dataset"

From edegan.com
Jump to navigation Jump to search
Line 1: Line 1:
This page details the (currently 1st draft version) construction of the Cultural Homogenteity in Venture Capital Data.
+
This page details the (currently 1st draft version) construction of the Cultural Homogenteity in Venture Capital Data. This data is posted for reference (and debate) by project members. The choices made in constraints, variables, processing, and so forth are often arbitrary and will no doubt be changed as the project progresses.
  
 
==Constraints on Retrieved Data==
 
==Constraints on Retrieved Data==
Line 12: Line 12:
 
==Retrieved Data by VentureXpert Perspective==
 
==Retrieved Data by VentureXpert Perspective==
  
The data was processed by the [[Normalizer.pl]] perl script to produce 3rd normal form relational database tables.  
+
The data was processed by the [[Normalizer.pl]] perl script to produce 3rd normal form relational database tables. In the case of executive titles this was problematic and required rekeying in the database.
  
 
===Portfolio Companies===
 
===Portfolio Companies===
Line 47: Line 47:
 
*Executives (both Company and Fund executives) have multiple job titles - the first listed title was taken (requires rekeying)
 
*Executives (both Company and Fund executives) have multiple job titles - the first listed title was taken (requires rekeying)
 
*Executive Name Prefixes were parsed for Dr and Ms (recorded as binary codes)
 
*Executive Name Prefixes were parsed for Dr and Ms (recorded as binary codes)
 +
*Anglicized First Names and genders were (incompletely) identified using the [[SSA Baby Names | U.S. Social Security Administration Baby Names list]]
 
*The Company and fund zip codes were standardized to 5 digits
 
*The Company and fund zip codes were standardized to 5 digits
*Anglicized First Names and genders were (incompletely) identified using the [[SSA Baby Names | U.S. Social Security Administration Baby Names list]]
+
*Companies must not have null or "Undisclosed" names
 +
*Funds must not be named "Undisclosed Fund" and must have participated in a round in the period of interest (i.e. within the constraint on date of first investment on companies)

Revision as of 19:18, 2 August 2009

This page details the (currently 1st draft version) construction of the Cultural Homogenteity in Venture Capital Data. This data is posted for reference (and debate) by project members. The choices made in constraints, variables, processing, and so forth are often arbitrary and will no doubt be changed as the project progresses.

Constraints on Retrieved Data

The following constraints were placed in the retrieval of data from VentureXpert (via SDC Platinum):

  • Portfolio Company Date of First Investment: 2003-2007 inclusive
  • Portfolio Company Nation: US
  • Venture Capital Fund Nation: US
  • Portfolio Company Standard US Venture Disbursement: Yes (Note: Correlates almost perfectly with PWC Moneytree VC Deals - excludes private equity, angel investment, and other non-VC, as recognised by Thomson)
  • Venture Capital Fund PWC Moneytree Deals: Yes (see above)

Retrieved Data by VentureXpert Perspective

The data was processed by the Normalizer.pl perl script to produce 3rd normal form relational database tables. In the case of executive titles this was problematic and required rekeying in the database.

Portfolio Companies

  • Company (Primary Key: CoExex-1)
    • CoExec-1, DateCompanyReceivedFirstInvestment, CompanyFoundingDate, CompanyZipCode, CompanyCity, CompanyAreaCode, CompanyCounty, CompanyName, CompanyNation, CompanyNationCode, CompanyState, CompanyStateCode, CompanyStreetAddressLine1, CompanyStreetAddressLine2, CompanyIndustryClass, CompanyIndustryMajorGroup, CompanyIndustryMinorGroup, CompanyIndustrySubGroup1, CompanyIndustrySubGroup2, CompanyIndustrySubGroup3, StandardUSVentureDisbursement
  • CoExec (Primary Key: Co-Exec-3)
    • CoExec-1, CoExec-3, ExecutiveisNonManagingBoardMember, ExecutiveisPrimaryContact, ExecutivesCity, ExecutivesFirstName, ExecutivesLastName, ExecutivesNamePrefix, ExecutivesEMailAddress, ExecutivesPhoneNumber, ExecutivesPreviousPosition
  • CoExecTitle (Primary Key: Co-Exec-2)
    • CoExec-1, CoExec-2, ExecutivesJobTitle

Fund

  • Fund (Primary Key: FundExec-1)
    • FundExec-1, FirmFundList1stCloseDateofeachFund, FirmFoundingDate, FundInitialClosing, FundAreaCode, FundCity, FundCounty, FundStageFocus, FundInvestmentType, FundMSA, FundMSACode, FundName, FundNation, FundNationCode, FundRaisingStatus, FundSequenceNo, FundSequenceType, FundSizeMil, FundState, FundStateCode, FundTargetSize000, FundTypeLongDescription, FundTypeShortDescription, FundYear, FundZipCode, FirmReportedCapitalunderMgmtMil, FirmNation, FirmName, FirmGeographyPreference, FirmIndustryPreference, FirmPreferredInvestmntRoleCode, FirmInvestmentStagePreference, FirmPreferredMaxInvestmentMil, FirmPreferredMinInvestmentMil, FirmState, FirmZipCode, FirmStateCode, PWCMoneytreeDealsYN, FirmFundListNameofeachFund
  • FundExec (Primary Key: FundExec-2)
    • FundExec-1, FundExec-3, ExecutiveisNonManagingBoardMember, ExecutiveisPrimaryContact, ExecutivesCity, ExecutivesFirstName, ExecutivesLastName, ExecutivesNamePrefix, ExecutivesPreviousPosition
  • FundExecTitle (Primary Key: FundExec-2)
    • FundExec-1, FundExec-2, ExecutivesJobTitle

Disbursement

  • Round (Primary Key: Round-2)
    • Round-1, Round-2, RoundDates, CompanyStageLevel1ateachRoundDate, RoundAmtDisclosed000, RoundAmtEstimated000, RoundNumbers, NumberofInvestorseaRound
  • RoundCompany (Primary Key: Round-1)
    • Round-1, CompanyName, CompanyState, CompanyStateCode, TotalKnownAmtInvestedinCompany000
  • RoundInvestor (Primary Key: Round-3)
    • Date, DisclosedAmtk, Investor, Round-1, Round-3

Lookup Tables

Lookup tables for Stage, State and VentureXpert Minor Industry Code are provided on the VentureXpert page. The numeric values were recorded in the database.

Processing the Data

The following problems were addressed and steps undertaken:

  • Executives (both Company and Fund executives) have multiple job titles - the first listed title was taken (requires rekeying)
  • Executive Name Prefixes were parsed for Dr and Ms (recorded as binary codes)
  • Anglicized First Names and genders were (incompletely) identified using the U.S. Social Security Administration Baby Names list
  • The Company and fund zip codes were standardized to 5 digits
  • Companies must not have null or "Undisclosed" names
  • Funds must not be named "Undisclosed Fund" and must have participated in a round in the period of interest (i.e. within the constraint on date of first investment on companies)