Difference between revisions of "Patent Thicket"

Project
Patent Thicket
Project Information
Has title	Patent Thicket
Has owner	Grace Tan
Has start date	Summer 2018
Has deadline date
Has project status	Active
Is dependent on	Google Scholar Crawler, PDF Downloader, PDF to Text Converter
Has sponsor	McNair Center
Has project output	Tool
	Copyright © 2019 edegan.com. All Rights Reserved.

Latest revision as of 13:47, 21 September 2020

This program converts all pdfs to txt files. It also generates two files _LOG_ERR.txt and _LOG_RUN.txt that includes the names of the pdfs that could not be converted and were converted successfully. Some of the files that were successfuly converted, especially the very small ones, don't have the text from the paper.

There were 573 successful txt files and 36 files that failed to convert (which does not add up to 608 but I'm not sure why).

@@ Line 1: / Line 1: @@
-{{McNair Projects
+{{Project
+|Has project output=Tool
+|Has sponsor=McNair Center
 |Has title=Patent Thicket
 |Has owner=Grace Tan
@@ Line 37: / Line 39: @@
 This program converts all pdfs to txt files. It also generates two files _LOG_ERR.txt and _LOG_RUN.txt that includes the names of the pdfs that could not be converted and were converted successfully. Some of the files that were successfuly converted, especially the very small ones, don't have the text from the paper.
+There were 573 successful txt files and 36 files that failed to convert (which does not add up to 608 but I'm not sure why).

Difference between revisions of "Patent Thicket"

Latest revision as of 13:47, 21 September 2020

Contents

Location of Files

Google Scholar Crawler

Downloading PDFs

pdf_to_txt_bulk_PTLR.py

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools