Changes

Jump to navigation Jump to search
2,848 bytes added ,  13:44, 21 September 2020
no edit summary
{{Project|Has project output=Data,Tool|Has sponsor=McNair ProjectsCenter
|Has title=USITC Data
|Has owner=Harrison Brown
The Postgres SQL Server:
128.42.44.182/bulk/USITC
 
==Additional Information==
There is more information that the USITC provides besides 337 notices.
 
Here is information and a database on Section 701/731
https://www.usitc.gov/trade_remedy/trade_research_tools
https://pubapps2.usitc.gov/sunset/
 
 
=New Work=
==USITC 337 Cases Tab Delimited Text==
USITC patent information was gathered from the investigations.json file downloaded from the USITC website (https://pubapps2.usitc.gov/337external/, Click on Cases Instituted After 2008).
This contains information on 337 cases and their respondents/complainants and the patents that were part of the case.
The code and results for this program are here:
Projects/USITC/JSON_scraping_python
The program grabs the information, places it into lists of lists in Python, and then writes to the file names listed below. The files do not have headers and null values are set to be empty strings.
To create the tab delimited text files, run code.py in the JSON_scraping_python directory. This has all of the file names hard coded. It will create the following files
investigation_info.txt
Schema for this file is id, title, investigation number, investigation type, docket number, date of publication notice
 
complainant_info.txt
Schema for this file is investigation id, investigation number, Complaintant name, complainant outside party ID, comp_city, comp country
 
respondent_info.txt
Schema for this file is investigation id, investigation number, Respondent Outside Party ID , Respondent Name, Respondent City, Respondent Country
 
patent_info.txt
Schema for this file is Investigation Number, Patent ID, Patent Number, Active Date, Inactive Date,
 
==XML Information==
UPDATE: used JSON file of data to convert to tab-delimited text.
There is an XML file that contains information on investigations. To get it go to the link below and 'xml' link that is under the tab that under 'Cases instituted after October 2008'.
https://pubapps2.usitc.gov/337external/
 
The information that is found this file can be grabbed with an XML parser. For each investigation, we can find out
* Investigation Number ex - (<entry key="investigationNo">966</entry>)
* Date of publication Notice - (<entry key="dateOfPublicationFrNotice">2015-09-24T04:00:00.000Z</entry>)
* Title ex - <entry key="title">Silicon-on-Insulator Wafers</entry>
* There is an entry for patent numbers, ex - <entry key="patentNumbers">
* Investigation Type ex - <entry key="investigationType">Violation</entry>
* Respondents can be found under <entry key="respondent">
* Complainant can be found under <entry key="complainant">
Additional information can also be gathered from the XML document
 
To find information on cases prior to 2008, go to the link above and click on 'Looking for cases instituted prior to October 2008?', and it will
download a csv file.
* The investigation number, Title, Unfair Act Alleged, Patent Numbers,Complainants, Respondents, can be grabbed easily from the CSV
* Target Date, Beginning and Ending Dates contain notes (some cases are extended and dates are changed)and so we may need to do some text processing to grab this information
==Old Work==
==Alternative Solutions==
Here is where you can get data on the USITC 337 notices instead of extracting this information from PDFs
337Info - Unfair Import Investigations Information System
https://pubapps2.usitc.gov/337external/advanced
To use you must select fields from the GUI at the bottom of the page
 
 
==Status==
Currently the web scraper is able to gather all of the data that I can gather from the HTML.
There are a few cases where the Investigation Number is not listed and I need to test for those and
fix that in the code.
 
Downloaded most of the PDFs. There were errors download some of the files. I need to calculate what PDFs were not able to be downloaded and why.
Investigating what other ways we can gather the information.

Navigation menu