Changes

Jump to navigation Jump to search
no edit summary
{{Project|Has project output=Data|Has sponsor=McNair ProjectsCenter
|Has title=USPTO Patent Litigation Data
|Has owner=Ed Egan,
|Has start date=May 2017
|Has keywords=Patent, USPTO, Litigation, Data
|Has project status=ActiveTabled
}}
==Getting the data==
 
The data is available from https://bulkdata.uspto.gov/data2/patent/litigation/2015/, which is linked directly from https://bulkdata.uspto.gov/
 
The data comes in .dta (STATA) and .csv formats. We took the .csv files:
attorneys.csv.zip 29751808 2016-12-29 07:44
cases.csv.zip 3085347 2016-12-29 07:45
documents.csv.zip 244399591 2016-12-29 07:46
names.csv.zip 7256777 2016-12-29 07:48
pacer_cases.csv.zip 2453937 2016-12-29 07:48
csv.zip 286947372 2016-12-29 07:45
 
csv.zip appears to contain the five other files!
 
The files are in E:\McNair\PatentData\Litigation\Raw Data
 
==Quick Review==
 
===File Tops (5 lines)===
 
names.csv
case_row_id,case_number,party_row_count,party_type,name_row_count,name
1,0:79-cv-06704-JCP,1,Plaintiff,1,Burroghs Wellcome Co.
1,0:79-cv-06704-JCP,2,Defendant,2,Generix Drug Corp.
3,0:83-cv-06860-JAG,3,Plaintiff,3,Kenneth R. Cornwall
3,0:83-cv-06860-JAG,4,Defendant,4,"U. S. COnstruction Manufacturing, Inc."
...
 
attorneys.csv
case_row_id,case_number,party_row_count,party_type,attorney_row_count,name,contactinfo,position
14,0:92-cv-00398-MJP,40,Plaintiff ,1,"Joel Wyman Collins , Jr","Collins and Lacy; PO Box 12487; Columbia, SC 29211; 803-256-2660; Fax:
803-771-4484; Email: jcollins@collinsandlacy.com",LEAD ATTORNEY; ATTORNEY TO BE NOTICED
14,0:92-cv-00398-MJP,41,Plaintiff ,2,"Joel Wyman Collins , Jr",(See above for address),LEAD ATTORNEY; ATTORNEY TO BE NOTICED
...
 
cases.csv
case_row_id,case_number,pacer_id,case_name,court_name,assigned_to,referred_to,case_cause,jurisdictional_basis,demand,jury_demand,lead_case,related_case,settlement,date_filed,date_closed,date_last_filed
54973,01-Jan-1970,223949,"ASTRAZENECA AB et al v. SANDOZ, INC.",UNITED STATES DISTRICT COURT DISTRICT OF NEW JERSEY,Judge Joel A. Pisano,Magistrate Judge Tonianne J. Bongiovanni,35:271 Patent Infringement,Federal Question,,,,,,2009-01-14,2011-06-02,
427,0:00-cv-00019,1338,Banner Engineering v. Harris Instrument,UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA,,,,,,,,,,2000-01-04,2000-03-09,2000-03-02
428,0:00-cv-00058,1377,"Advanced UroScience, et al v. Inamed Corporation, et al",UNITED STATES DISTRICT COURT DISTRICT OF MINNESOTA,,,,,,,,,,2000-01-11,2000-11-30,2001-02-28
429,0:00-cv-00172-DWF-AJB,,Farnam Companies Inc v. Miller Manufacturing,U.S. District of Minnesota (DMN),Judge Donovan W. Frank,Chief Mag. Judge Arthur J. Boylan,35:271 Patent Infringement,Federal Question,,,0:98-cv-00040-DWF-AJB,"Dist of AZ, 99-01804",,,,
...
 
documents.csv
case_row_id,case_number,doc_count,attachment,date_filed,long_description,doc_number,short_description,upload_date
1,0:79-cv-06704-JCP,1,,2000-08-03,"COPY OF PAPER DOCKET SHEET (kw, Deputy Clerk) (Entered: 08/03/2000)",37,,
1,0:79-cv-06704-JCP,2,,1982-05-31,"CASE CLOSED. Case and Motions no longer referred to Magistrate. (kw, Deputy Clerk) (Entered: 08/03/2000)",,,
3,0:83-cv-06860-JAG,1,,2004-02-13,COPY OF PAPER DOCKET SHEET (Former Deputy Clerk) (Entered: 02/13/2004),123,,
3,0:83-cv-06860-JAG,2,,1992-03-01,Case closed (Former Deputy Clerk) (Entered: 03/05/1992),,,
...
 
pacer_cases.csv
case_name,court_code,court_name,date_closed,case_number,pacer_id,date_filed
"Davis v. Favelle Favco Cranes, et al",txsd,Texas Southern District Court,08/13/2001,1:2000-cv-00003,3,2000-01-03
Monsanto v. Sierks et al,ned,Nebraska District Court,07/10/2002,4:2002-cv-00105,4,2002-03-04
"Armament Sys & Proc v. Coast Cutlery Co Inc, et al",wied,Wisconsin Eastern District Court,03/03/2008,1:2000-cv-01273,4,2000-09-20
Tektronix Inc. v. Integraph Corporation,ord,Oregon District Court,09/04/1998,3:1998-cv-00599,7,1998-07-09
...
 
===File specs===
 
pacer_cases.csv: 74,954 records
case_name varchar(255),
court_code varchar(10), --txsd, ned
court_name varchar(255),
date_closed date, --mm/dd/yyyy
case_number varchar(100), --e.g., 1:2000-cv-00003
pacer_id int, --appears to be int
date_filed date --yyyy-mm-dd
 
documents.csv 5,186,345 records
case_row_id int,
case_number varchar(100), --e.g. 0:79-cv-06704-JCP
doc_count int,
attachment varchar(255), --NULL when seen
date_filed date, --yyyy-mm-dd
long_description text,
doc_number int,
short_description varchar(255) --NULL when seen
upload_date date --NULL when seen
 
cases.csv: 74,630 records
case_row_id int,
case_number varchar(100),
pacer_id int,
case_name text,
court_name text,
assigned_to varchar(255), --Often NULL. Examples Judge Joel A. Pisano
referred_to varchar(255), --Often NULL. Examples Magistrate Judge Tonianne J. Bongiovanni
case_cause varchar(255), --Often NULL. Examples: 35:271 Patent Infringement, 35:145 Patent Infringement, etc.
jurisdictional_basis varchar(255), --Often NULL. Examples: Federal Question
demand varchar(100), --appears to be one of NULL, plaintiff, defendant, both
jury_demand varchar(100),
lead_case varchar(100), --appears to be case_number
related_case text, --appears to be a mix of things in semicolon seperated list
settlement text, --appears always NULL
date_filed date, --yyyy-mm-dd
date_closed date, --yyyy-mm-dd
date_last_filed date --yyyy-mm-dd
 
names.csv 561,019 records
case_row_id int,
case_number varchar(100),
party_row_count int,
party_type varchar(20), --Plaintiff or Defendant
name_row_count int,
name varchar(255)
 
attorney.csv: 1,223,419 records
case_row_id int,
case_number varchar(100),
party_row_count int,
party_type varchar(20), --Plaintiff or Defendant
attorney_row_count int,
name varchar(255),
contactinfo varchar(255), --semicolon seperated value
position varchar(255) --semicolon seperated list e.g., LEAD ATTORNEY; ATTORNEY TO BE NOTICED
 
==Obvious Issues==
 
===There are no codified patent numbers and outcomes===
 
Some patent numbers can be found in documents.long_description but it seems that this is the docket headers and most patents will likely be in the documents themselves (which we don't have and would have to OCR).
 
We might be able to piece together outcomes from documents.long_description but this is going to be very hard. Clearly, this is one of Lex Machina's value added.

Navigation menu