Manual Collection

Files are in:

E:\McNair\Projects\SBIR

Each file is a group of 1000 companies. Each group of 1000 is numbered sequentially.

Rough notes

Get the data from https://www.sbir.gov/sbirsearch/award/all
Built a Selenium Web Driver which is stored in E:\McNair\Software\Scripts\Selenium Web Drivers
Does not work because there is a captcha that must be entered after selecting xls download

In your python script:

Make sure that you properly set the chromedriver path if you don't have it under root. For example: webdriver.Chrome("/Users/adriansmart/PycharmProjects/SeleniumTest/chromedriver")
Use driver.find_element_by_xpath to select the element on html page. You will need to enter the xpath in this function so first load the website in a browser.
Next, right click on the page element you want the xpath and select inspect. This will launch the html inspector and highlight the relevant lines of code
Right click on what looks like the right piece of code and select "Copy xpath data"
Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']")

Project
Collecting SBIR Data
Project Information
Has title	Collecting SBIR Data
Has owner	Adrian Smart
Has start date	June 6, 2017
Has deadline date
Has keywords	Data, Tool
Has project status	Complete
Does subsume	SBIR Evaluation
Has sponsor	McNair Center
	Copyright © 2019 edegan.com. All Rights Reserved.

The objective of this project was to concatenate 162 xlsx files into one large tab delimited text file

The python script can be found here:

  E:\McNair\Projects\SBIR\concat_excel.py

The resulting file is located here:

  E:\McNair\Projects\SBIR\SBIR.txt