Difference between revisions of "Matching VentureOne (Data)"

From edegan.com
Jump to navigation Jump to search
Line 32: Line 32:
 
   log type:  text
 
   log type:  text
 
  opened on:  5 Jul 2016, 16:49:23
 
  opened on:  5 Jul 2016, 16:49:23
 +
 
   . describe
 
   . describe
 
   
 
   
Line 39: Line 40:
 
  vars:            36                          11 Jun 2016 17:31
 
  vars:            36                          11 Jun 2016 17:31
 
  size:    10,655,541                          (_dta has notes)
 
  size:    10,655,541                          (_dta has notes)
 +
----------------------------------------------------------------------------------------------------------------------
 +
----------------------------------------------------------------------------------------------------------------------
 +
              storage  display    value
 +
variable name  type    format    label      variable label
 +
----------------------------------------------------------------------------------------------------------------------
 +
id_vone        double  %9.0g                VentureOne id
 +
name            str39  %39s                  startup name
 +
patent          str9    %9s                  patno in string
 +
apn            str6    %6s                  pat application number
 +
nmi            str40  %40s                  inventor name
 +
ttl            str244  %40s                  invention title
 +
nma            str65  %65s                  original assignee
 +
ocd            str15  %15s                  main us patent class
 +
icd            str15  %15s                  main intl patent class
 +
apd            float  %td                  application date
 +
gdateold        float  %td                  Grant date
 +
fnd_year        float  %8.0g                startup founding year
 +
last_yr        float  %9.0g              * OLD last_yr, 2006; see notes
 +
source          byte    %8.0g                1 if 2012 delphion searches; else from 2004/5 search
 +
pdate          float  %td                  priority date, delphion; may pre-date application date if provisional apps
 +
utility        float  %9.0g              * 1 if utility patent as initially awarded; 0 if other (reissued, reexamed, design
 +
state_country  str3    %9s                  state/country of first inventor listed
 +
asscode        float  %9.0g                assignee code; basic.dta
 +
ayear          int    %9.0g                application year
 +
amonth          byte    %9.0g                application month
 +
atype          str1    %9s                * initial assignee type; see notes
 +
class          str3    %9s                  3 digit us pat class
 +
subclass        str6    %9s                  patent subclass
 +
gdate          int    %d                    grant, or issuance, date
 +
industry        str15  %15s                  semi, software, or med devices
 +
state_hq        str2    %9s                  firm hq location; vone
 +
status06        str4    %9s                * status of firm known in 2006; rhs truncation varies by sector
 +
exitdate        str8    %9s                  exit date, if known
 +
exityr          str4    %9s                  exit year, if known
 +
status08        str6    %9s                * status of firm in 2008, see notes
 +
last_yr08      int    %8.0g              * exityr if ipo/acq, else 2008
 +
dcohort        float  %9.0g                1 if founding yr during 1987-99
 +
lastyr08_minu~r float  %9.0g               
 +
dsearch_assign  float  %9.0g                1 if searches of pat assignment data need to be conducted; carlosn confirm?
 +
carlos_chk      float  %9.0g                carlos: pls confirm assignment data = compiled for these pats
 +
entityid        long    %12.0g                unique startup id as of 2008, vone
 +
                                            * indicated variables have notes
 
----------------------------------------------------------------------------------------------------------------------
 
----------------------------------------------------------------------------------------------------------------------
  

Revision as of 16:33, 6 July 2016


McNair Project
Matching VentureOne (Data)
Project logo 02.png
Project Information
Project Title
Start Date
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


Overview

In this matching process, we will join patent data to VentureOne companies and count the number of patents that affiliated to each company.

Raw Data

Original data set of VentureOne companies can be found at: E:\McNair\Projects\Venture One Data\Venture Data 1.xlsx

  • All Variables: EntityName,Employees, City, State, Zip, AreaCode, Business Status, IndustryGroup...etc
  • Variables used for matching: EntityName

Original patent data is in our database: 128.42.44.181/bulk/allpatentsprocessed

Procedure

We first get the standard company names for VentureOne companies from the source VentureOne data set. Then we standardize the names of the companies that have patents from our patent database. Based on the common standard company names, we join patent information to VentureOne companies.

Final Matched Tables

  1. Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company (including the ones that own no patents). It can be found at:E:\McNair\Projects\Venture One Data\venturesummary.txt
  2. A table contains all patent information for the companies that have patents and can be found at E:\McNair\Projects\Venture One Data\venturefullyjoined.txt

Desired Variables


     name:  <unnamed>
      log:  C:\Users\ArielSun\Downloads\varlist.log
 log type:  text
opened on:   5 Jul 2016, 16:49:23

 . describe

 Contains data from C:\Users\ArielSun\Downloads\allpats_3sectors_06jun13.dta
 obs:        19,409                          
vars:            36                          11 Jun 2016 17:31
size:    10,655,541                          (_dta has notes)


             storage   display    value

variable name type format label variable label


id_vone double %9.0g VentureOne id name str39 %39s startup name patent str9 %9s patno in string apn str6 %6s pat application number nmi str40 %40s inventor name ttl str244 %40s invention title nma str65 %65s original assignee ocd str15 %15s main us patent class icd str15 %15s main intl patent class apd float %td application date gdateold float %td Grant date fnd_year float %8.0g startup founding year last_yr float %9.0g * OLD last_yr, 2006; see notes source byte %8.0g 1 if 2012 delphion searches; else from 2004/5 search pdate float %td priority date, delphion; may pre-date application date if provisional apps utility float %9.0g * 1 if utility patent as initially awarded; 0 if other (reissued, reexamed, design state_country str3 %9s state/country of first inventor listed asscode float %9.0g assignee code; basic.dta ayear int %9.0g application year amonth byte %9.0g application month atype str1 %9s * initial assignee type; see notes class str3 %9s 3 digit us pat class subclass str6 %9s patent subclass gdate int %d grant, or issuance, date industry str15 %15s semi, software, or med devices state_hq str2 %9s firm hq location; vone status06 str4 %9s * status of firm known in 2006; rhs truncation varies by sector exitdate str8 %9s exit date, if known exityr str4 %9s exit year, if known status08 str6 %9s * status of firm in 2008, see notes last_yr08 int %8.0g * exityr if ipo/acq, else 2008 dcohort float %9.0g 1 if founding yr during 1987-99 lastyr08_minu~r float %9.0g dsearch_assign float %9.0g 1 if searches of pat assignment data need to be conducted; carlosn confirm? carlos_chk float %9.0g carlos: pls confirm assignment data = compiled for these pats entityid long %12.0g unique startup id as of 2008, vone

                                           * indicated variables have notes

Detailed Data Processing

  • Get the VentureOne data ready
  1. Source file for VentureOne data E:\McNair\Projects\Venture One Data\Venture Data 1.xlsx Original data source
  2. Clean it up E:\McNair\Software\Scripts\Matcher\Input\Venture Data 1.txt extraneous symbols and words removed
  3. Match it against itself to get standardized entity names E:\McNair\Projects\Venture One Data\Cleaned and Matched Data.xlsx
  • Get the patent data ready
  1. Draw the distinct assignees Z:\allpatentsprocessed\DistinctAssignees2.txt
  2. Match them against themselves to get standardized org names for patent data Z:\allpatentsprocessed\DistinctAssignees2matched.txt
  • Match standardized org names of patent data to standardized entity names of venture data
Z:\allpatentsprocessed\Venture Patent Matched.txt
  • Join patent data to venture data to get patent information of each venture-backed company
  1. Join patent data to assignee data, creating firstjoin_cleaned which matches assignees to patent numbers.
  2. Join firstjoin_cleaned data to matchassignee data, creating secondjoin_cleaned which matches standard org names to patent numbers
  3. Join secondjoin_cleaned data to venturepatentmatched data, creating fourthjoin_cleaned which matches standard venture company names to patent numbers
  • Final summary tables
  1. Summary table displaying number of patents owned, minimum grant year, maximum grant year and average grant year for each company E:\McNair\Projects\Venture One Data\venturepatentreallyfinal.txt
  2. A table of all patent information for each company that has patents E:\McNair\Projects\Venture One Data\venturepatentfullyjoined.txt
  • Notes
  1. All data in allpatentsprocessed database. Access it by logging on to researcher@McNair DBServ:/bulk/allpatentsprocessed
  2. A script of detailed processing procedure can be found at E:\McNair\Projects\Venture One Data\patent data script.txt