Difference between revisions of "Federal Grant Data"

From edegan.com
Jump to navigation Jump to search
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{McNair Projects
+
{{Project
 +
|Has project output=Data
 +
|Has sponsor=McNair Center
 
|Has title=Federal Grant Data
 
|Has title=Federal Grant Data
 
|Has owner=Catherine Kirby, Jeemin Sim, Avesh Krishna,
 
|Has owner=Catherine Kirby, Jeemin Sim, Avesh Krishna,
Line 6: Line 8:
 
|Does subsume=NSF Data, NIH Data,
 
|Does subsume=NSF Data, NIH Data,
 
}}
 
}}
 +
<onlyinclude>The [[Federal Grant Data]] project collects and processes [[NIH Data]], [[NSF Data]], and other federal grant information from structured government sources and imports it into a relational database for use. See also: The [[Trial Data Project]] and the [[FDA Trials Data]] project.</onlyinclude>
  
==General==
+
==Latest Update==
  
Working with NIH, NSF, and other federal grant data
+
In April 2019, Ed updated the NSF and NIH data.
  
==Latest Update==
+
===NSF===
  
In April 2019, Ed:
+
The update process was:
 
*Visit https://www.nsf.gov/awardsearch/download.jsp
 
*Visit https://www.nsf.gov/awardsearch/download.jsp
 
*Retrieve 2017, 2018, and 2019 zips to E:\projects\grants\NSF
 
*Retrieve 2017, 2018, and 2019 zips to E:\projects\grants\NSF
 +
*Extract them to E:\projects\grants\NSF\XML
 +
*Remove 3 bad XML files that cause the parser to crash
 +
*Move the whole thing back to E:\mcnair\Projects\Federal Grant Data\NSF\NSF Extracted Data
 +
*Fix up and run Jeemin_NSF_XML_Parser.py, so that it also takes ProgramElement (see below) and make:
 +
**
 +
*Run LoadNSF.sql to produce tables NSFGeneral, NSFInvestigator and NSFInstitution in dbase '''grants'''
 +
 +
Note that '''program_code'''s are available from
 +
*STTR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*STTR*
 +
*SBIR: https://www.nsf.gov/awardsearch/lookup?type=program&qrytxt=*SBIR*
 +
 +
Combined SBIR/STTR codes
 +
'168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266','5370','168E','1591','1505','Z408'
 +
 +
SBIR Codes:
 +
'168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266'
 +
STTR Codes:
 +
'5370','168E','1591','1505','Z408'
 +
 +
Unfortunately, we need the Program Element code, shown below, but our extractor doesn't currently pull it.
 +
 +
<xsd:element maxOccurs="unbounded" name="ProgramElement">
 +
  <xsd:complexType>
 +
  <xsd:sequence>
 +
    <xsd:element name="Code" type="xsd:int"/>
 +
    <xsd:element name="Text" type="xsd:string"/>
 +
  </xsd:sequence>
 +
  </xsd:complexType>
 +
</xsd:element>
 +
 +
Correction: This appears fixed. There is a programelementcode in nsfgeneral. However, the code is XXXX in 42.6% of cases.
 +
 +
===NIH===
 +
 +
The 2018 update process was:
 
*Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx
 
*Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx
 
*Retrieve 2016, 2017, and 2018 csvs to E:\projects\grants\NIH\Source Files
 
*Retrieve 2016, 2017, and 2018 csvs to E:\projects\grants\NIH\Source Files
Line 21: Line 59:
 
*Add old data files to same directory
 
*Add old data files to same directory
 
*Run LoadNIH.sql to product table nih in dbase '''grants'''
 
*Run LoadNIH.sql to product table nih in dbase '''grants'''
 +
 +
Supporting info for NIH grants:
 +
*A data dictionary for the NIH grants is here: https://exporter.nih.gov/about.aspx
 +
*'''Activity''' Codes are here: https://grants.nih.gov/grants/funding/ac_search_results.htm
 +
*STTR codes are: R41,R42,UT1,UT2
 +
*SBIR codes are: R43,R44,U43,U44
 +
*SBIR/STTR Joint codes: RC3,SB1,UB1,UC3
 +
*'''Total_Cost''' is only available in certain instances but is the best cost variable.
 +
 +
The 2022 update process was:
 +
*Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx
 +
*Retrieve 2019, 2020, and 2021 csvs to E:\projects\grants\NIH\Source Files
 +
*Open each in excel, copy and paste to a tab-delimited text file.
 +
*Remove quotes from the file using regex (in textpad) and save as UTF8
 +
*Run clean.py to fix the bad records
 +
*Run LoadNIH.sql to load to the grants database (put input files in z:/bulk/grants)
 +
 +
Example processing code (for selected stats for Vermont):
 +
CREATE TABLE nih_vt_stats AS
 +
SELECT extract(year from award_notice_date) as year, direct_cost_amt+indirect_cost_amt as sum_amounts, total_cost, full_project_num,
 +
CASE WHEN direct_cost_amt+indirect_cost_amt>total_cost THEN direct_cost_amt+indirect_cost_amt ELSE total_cost END AS amount,
 +
CASE WHEN org_name LIKE 'UNIVERSITY OF VERMONT%' THEN 1 ELSE 0 END AS uvm
 +
FROM nih WHERE org_state='VT';
 +
--Updated Rows 9493
 +
 +
SELECT year, uvm, sum(amount), count(DISTINCT full_project_num)
 +
FROM nih_vt_stats GROUP BY year, uvm ORDER BY uvm,year;
  
 
==grants.gov==
 
==grants.gov==

Latest revision as of 17:58, 10 April 2022


Project
Federal Grant Data
Project logo 02.png
Project Information
Has title Federal Grant Data
Has owner Catherine Kirby, Jeemin Sim, Avesh Krishna
Has start date
Has deadline date
Has keywords Data
Has project status Active
Does subsume NSF Data, NIH Data
Has sponsor McNair Center
Has project output Data
Copyright © 2019 edegan.com. All Rights Reserved.

The Federal Grant Data project collects and processes NIH Data, NSF Data, and other federal grant information from structured government sources and imports it into a relational database for use. See also: The Trial Data Project and the FDA Trials Data project.

Latest Update

In April 2019, Ed updated the NSF and NIH data.

NSF

The update process was:

  • Visit https://www.nsf.gov/awardsearch/download.jsp
  • Retrieve 2017, 2018, and 2019 zips to E:\projects\grants\NSF
  • Extract them to E:\projects\grants\NSF\XML
  • Remove 3 bad XML files that cause the parser to crash
  • Move the whole thing back to E:\mcnair\Projects\Federal Grant Data\NSF\NSF Extracted Data
  • Fix up and run Jeemin_NSF_XML_Parser.py, so that it also takes ProgramElement (see below) and make:
  • Run LoadNSF.sql to produce tables NSFGeneral, NSFInvestigator and NSFInstitution in dbase grants

Note that program_codes are available from

Combined SBIR/STTR codes
'168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266','5370','168E','1591','1505','Z408'
SBIR Codes:
'168E','5370','169E','166E','165E','164E','5371','163E','167E','5151','5727','4804','Y052','1532','Y813','2282','6537','Y350','4645','9311','2266'
STTR Codes:
'5370','168E','1591','1505','Z408'

Unfortunately, we need the Program Element code, shown below, but our extractor doesn't currently pull it.

<xsd:element maxOccurs="unbounded" name="ProgramElement">
 <xsd:complexType>
  <xsd:sequence>
   <xsd:element name="Code" type="xsd:int"/>
   <xsd:element name="Text" type="xsd:string"/>
  </xsd:sequence>
 </xsd:complexType>
</xsd:element>

Correction: This appears fixed. There is a programelementcode in nsfgeneral. However, the code is XXXX in 42.6% of cases.

NIH

The 2018 update process was:

  • Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx
  • Retrieve 2016, 2017, and 2018 csvs to E:\projects\grants\NIH\Source Files
  • Combine them into RePORTER_PRJ_C_FY2016-17-18.csv, open in excel and save as tdt
  • Add old data files to same directory
  • Run LoadNIH.sql to product table nih in dbase grants

Supporting info for NIH grants:

The 2022 update process was:

  • Visit https://exporter.nih.gov/ExPORTER_Catalog.aspx
  • Retrieve 2019, 2020, and 2021 csvs to E:\projects\grants\NIH\Source Files
  • Open each in excel, copy and paste to a tab-delimited text file.
  • Remove quotes from the file using regex (in textpad) and save as UTF8
  • Run clean.py to fix the bad records
  • Run LoadNIH.sql to load to the grants database (put input files in z:/bulk/grants)

Example processing code (for selected stats for Vermont):

	CREATE TABLE nih_vt_stats AS
	SELECT extract(year from award_notice_date) as year, direct_cost_amt+indirect_cost_amt as sum_amounts, total_cost, full_project_num,
	CASE WHEN direct_cost_amt+indirect_cost_amt>total_cost THEN direct_cost_amt+indirect_cost_amt ELSE total_cost END AS amount,
	CASE WHEN org_name LIKE 'UNIVERSITY OF VERMONT%' THEN 1 ELSE 0 END AS uvm
	FROM nih WHERE org_state='VT';
	--Updated Rows	9493
	
	SELECT year, uvm, sum(amount), count(DISTINCT full_project_num)
	FROM nih_vt_stats GROUP BY year, uvm ORDER BY uvm,year;

grants.gov

Lists agency, estimated funding, opportunity title. opportunity number, and the date. Can be used to look up who was awarded the grants. I downloaded all grants offered by the NIH since 2005 (when the data was available)

E:\McNair\Projects\Federal Grant Data\NIH

NSF Grant Data

The NSF grant data are in folders by year, each grant is an xml file. The format of a typical file is shown below. There is data available from 1959 onwards, and full records after 1976. In 2016 there were ~10,000 awards.

Each year can be downloaded from: https://www.nsf.gov/awardsearch/download.jsp

Data is located in:

E:\McNair\Projects\Federal Grant Data\NSF

Sample XML schema

 <rootTag>
 <Award>
 <AwardTitle>
 Collaborative Research: Hybrid Organic-Inorganic Thermoelectric Materials
 </AwardTitle>
 <AwardEffectiveDate>08/31/2016</AwardEffectiveDate>
 <AwardExpirationDate>06/30/2017</AwardExpirationDate>
 <AwardAmount>95461</AwardAmount>
 <AwardInstrument>
 <Value>Standard Grant</Value>
 </AwardInstrument>
 <Organization>
 07030000
 <Directorate>
 <LongName>Directorate For Engineering</LongName>
 </Directorate>
 <Division>
 <LongName>Div Of Civil, Mechanical, & Manufact Inn</LongName>
 </Division>
 </Organization>
 <ProgramOfficer>
 <SignBlockName>Thomas F. Kuech</SignBlockName>
 </ProgramOfficer>
 <AbstractNarration>
 Thermoelectric materials are materials which can be used to convert thermal energy directly to electricity. The performance of a thermoelectric material is measured by the "figure of merit", termed ZT. There has been much research into increasing thermoelectric materials, figure of merit, however, progress in this area has been slow and most of the researched thermoelectric materials up to now are suffering from either high fabrication cost, usage of rare earth or toxic elements, or poor mechanical properties. Organic thermoelectric materials (OTEs) have recently attracted attention for low temperature applications (< 300K), especially cooling purposes, as they are flexible, low-cost and abundant, and low-cost fabrication methods for synthesizing them exist... A large class of semiconducting nanostructures (e.g. Si, CdTe, Bi, and PbTe nanowires and holely structures) combined with conjugated polymers (e.g., chemically-modified PEDOT and low bandgap polymers) and organic molecules (specifically charged chemical species attached to molecules such as CF3- substituted styrene molecules) will be simulated, synthesized and optimized to identify new hybrid materials with a potentially high ZT.
 </AbstractNarration>
 <MinAmdLetterDate>12/22/2016</MinAmdLetterDate>
 <MaxAmdLetterDate>12/22/2016</MaxAmdLetterDate>
 <ARRAAmount/>
 <AwardID>1723353</AwardID>
 <Investigator>
 <FirstName>Mona</FirstName>
 <LastName>Zebarjadi</LastName>
 <EmailAddress>mz6g@virginia.edu</EmailAddress>
 <StartDate>12/22/2016</StartDate>
 <EndDate/>
 <RoleCode>Principal Investigator</RoleCode>
 </Investigator>
 <Institution>
 <Name>University of Virginia Main Campus</Name>
 <CityName>CHARLOTTESVILLE</CityName>
 <ZipCode>229044195</ZipCode>
 <PhoneNumber>4349244270</PhoneNumber>
 <StreetAddress>P.O. BOX 400195</StreetAddress>
 <CountryName>United States</CountryName>
 <StateName>Virginia</StateName>
 <StateCode>VA</StateCode>
 </Institution>
 <ProgramElement>
 8092
 <Text>Materials Eng. & Processing</Text>
 </ProgramElement>
 <ProgramReference>
 024E
 <Text>MATERIALS DESIGN</Text>
 </ProgramReference>
 <ProgramReference>
 036E
 <Text>CIVIL INFRASTRUCTURE</Text>
 </ProgramReference>
 </Award>
 </rootTag>

Survey of Federal Funds for Research and Development Fiscal Years 2014–16

Excel spreadsheets with data regarding Federal Agency budgets and spending on R&D [1]

 E:\McNair\Projects\Federal Grant Data\NSF\NSF Data Tables

NIH

NIH grants for a given year can be downloaded from https://report.nih.gov/award/index.cfm

This does not specify what the grant was for, but it does list the amount and organization who received the grant. Additionally, the city and state of the organization are included. The number of awards received by each organization in a given year is listed as well. Excel Tables can be downloaded for each each year at this site. I downloaded the table for 2016 as a sample.

CK: It is possible to get data. https://exporter.nih.gov/ExPORTER_Catalog.aspx

E:\McNair\Projects\Federal Grant Data\NIH


DOD

DOD grant data folder (compiled for Augusta Ecosystem project) was copied from E:\McNair\Projects\Innovation Districts\Augusta Startup Ecosystem to E:\McNair\Projects\Federal Grant Data. It contains yearly data that can be filtered, and a few files on Augusta-related grants.