Changes

USITC (view source)

Revision as of 14:51, 28 September 2017

209 bytes removed , 14:51, 28 September 2017

no edit summary

contains the Investigation Title, Investigation No., link to the PDF on the website, Notice description, and date the notice was issued

I have also downloaded the PDFS from the website. These are the pdfs that are in the csv file. Some of the PDFS were no able to tot be downloaded. The PDFs are here E:\McNair\Projects\USITC\~~pdf_copy~~pdfs_copy

These files were downloaded using the script on the Postgres Server. There are issues downloading PDFs onto the remote windows machine for some reason.

==Status==

~~Check my work log to see what I have done on a day to day basis~~

Currently the web scraper is able to gather all of the data that I can gather from the HTML.

There are a few cases where the Investigation Number is not listed and I need to test for those and

fix that in the code.

~~Next steps will be to parse the PDFS, currently running a script to convert them to text~~ ~~Currently running a shell script to download the PDFs.~~ Downloaded most of the PDFs. There were errors download some of the files.

Hbrown512

Bureaucrats, Administrators (Semantic MediaWiki), Administrators

111

edits

Changes

USITC (view source)

Revision as of 14:51, 28 September 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools