Changes

Jump to navigation Jump to search
5,646 bytes added ,  17:58, 10 April 2022
{{Project|Has project output=Data|Has sponsor=GeneralMcNair Center|Has title=Federal Grant Data|Has owner=Catherine Kirby, Jeemin Sim, Avesh Krishna,|Has keywords=Data|Has project status=Active|Does subsume=NSF Data, NIH Data,}}<onlyinclude>The [[Federal Grant Data]] project collects and processes [[NIH Data]], [[NSF Data]], and other federal grant information from structured government sources and imports it into a relational database for use. See also: The [[Trial Data Project]] and the [[FDA Trials Data]] project.</onlyinclude>
Finding ==Latest Update== In April 2019, Ed updated the NSF and NIHdata. ===NSF=== The update process was:*Visit https://www.nsf.gov/awardsearch/download.jsp*Retrieve 2017, 2018, and 2019 zips to E:\projects\grants\NSF*Extract them to E:\projects\grants\NSF\XML*Remove 3 bad XML files that cause the parser to crash*Move the whole thing back to E:\mcnair\Projects\Federal Grant Data\NSF\NSFExtracted Data*Fix up and run Jeemin_NSF_XML_Parser.py, so that it also takes ProgramElement (see below) and other federal grant make:***Run LoadNSF.sql to produce tables NSFGeneral, NSFInvestigator and NSFInstitution in dbase '''grants''' Note that '''program_code'''s are available from *STTR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*STTR**SBIR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*SBIR*  Combined SBIR/STTR codes '168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266','5370','168E','1591','1505','Z408'  SBIR Codes: '168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266' STTR Codes: '5370','168E','1591','1505','Z408' Unfortunately, we need the Program Element code, shown below, but our extractor doesn't currently pull it.  <xsd:element maxOccurs="unbounded" name="ProgramElement"> <xsd:complexType> <xsd:sequence> <xsd:element name="Code" type="xsd:int"/> <xsd:element name="Text" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> Correction: This appears fixed. There is a programelementcode in nsfgeneral. However, the code is XXXX in 42.6% of cases. ===NIH=== The 2018 update process was:*Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx*Retrieve 2016, 2017, and 2018 csvs to E:\projects\grants\NIH\Source Files*Combine them into RePORTER_PRJ_C_FY2016-17-18.csv, open in excel and save as tdt*Add old data files to same directory*Run LoadNIH.sql to product table nih in dbase '''grants''' Supporting info for NIH grants:*A data dictionary for the NIH grants is here: https://exporter.nih.gov/about.aspx*'''Activity''' Codes are here: https://grants.nih.gov/grants/funding/ac_search_results.htm*STTR codes are: R41,R42,UT1,UT2*SBIR codes are: R43,R44,U43,U44*SBIR/STTR Joint codes: RC3,SB1,UB1,UC3*'''Total_Cost''' is only available in certain instances but is the best cost variable. The 2022 update process was:*Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx*Retrieve 2019, 2020, and 2021 csvs to E:\projects\grants\NIH\Source Files*Open each in excel, copy and paste to a tab-delimited text file.*Remove quotes from the file using regex (in textpad) and save as UTF8*Run clean.py to fix the bad records*Run LoadNIH.sql to load to the grants database (put input files in z:/bulk/grants) Example processing code (for selected stats for Vermont): CREATE TABLE nih_vt_stats AS SELECT extract(year from award_notice_date) as year, direct_cost_amt+indirect_cost_amt as sum_amounts, total_cost, full_project_num, CASE WHEN direct_cost_amt+indirect_cost_amt>total_cost THEN direct_cost_amt+indirect_cost_amt ELSE total_cost END AS amount, CASE WHEN org_name LIKE 'UNIVERSITY OF VERMONT%' THEN 1 ELSE 0 END AS uvm FROM nih WHERE org_state='VT'; --Updated Rows 9493 SELECT year, uvm, sum(amount), count(DISTINCT full_project_num) FROM nih_vt_stats GROUP BY year, uvm ORDER BY uvm,year; ==grants.gov== Lists agency, estimated funding, opportunity title. opportunity number, and the date. Can be used to look up who was awarded the grants. I downloaded all grants offered by the NIH since 2005 (when the datawas available) E:\McNair\Projects\Federal Grant Data\NIH
==NSF Grant Data==
The NSF grant data are in folders by year, each grant is an xml file. The format of a typical file is shown below. There is data available from 1959 onwards, and full records after 1976. In 2016 there were ~10,000 awards.
Each year can be downloaded from: [https://www.nsf.gov/awardsearch/download.jsp https://www.nsf.gov/awardsearch/download.jsp]
 
Data is located in:
E:\McNair\Projects\Federal Grant Data\NSF
 
Sample XML schema
<rootTag>
</Award>
</rootTag>
 
==Survey of Federal Funds for Research and Development Fiscal Years 2014–16==
 
Excel spreadsheets with data regarding Federal Agency budgets and spending on R&D [https://ncsesdata.nsf.gov/fedfunds/2014/]
 
E:\McNair\Projects\Federal Grant Data\NSF\NSF Data Tables
 
==NIH==
 
NIH grants for a given year can be downloaded from [https://report.nih.gov/award/index.cfm https://report.nih.gov/award/index.cfm]
 
This does not specify what the grant was for, but it does list the amount and organization who received the grant. Additionally, the city and state of the organization are included. The number of awards received by each organization in a given year is listed as well. Excel Tables can be downloaded for each each year at this site. I downloaded the table for 2016 as a sample.
 
CK: It is possible to get data. https://exporter.nih.gov/ExPORTER_Catalog.aspx
E:\McNair\Projects\Federal Grant Data\NIH
 
 
==DOD==
 
DOD grant data folder (compiled for Augusta Ecosystem project) was copied from E:\McNair\Projects\Innovation Districts\Augusta Startup Ecosystem to E:\McNair\Projects\Federal Grant Data. It contains yearly data that can be filtered, and a few files on Augusta-related grants.

Navigation menu