Trial Data Project

Revision as of 13:44, 21 September 2020 by Ed (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Trial Data Project
Project logo 02.png
Project Information
Has title Trial Data Project
Has owner Jeemin Sim, Catherine Kirby
Has start date
Has deadline date
Has project status Complete
Has sponsor McNair Center
Has project output Data, Tool
Copyright © 2019 All Rights Reserved.


This project works out how to reprocess the Clinical Trial Data from into structured and cleaned datasets. The data covers 239,638 studies from 2000 to present.

Information Source

Steps Followed to Extract the Trial Data

Extracting Data from XML Files

All the historical USPTO data is available as XML files. Here is the tree structure for the XML files:

-<clinical_study rank="61205">
    <phase>Phase 1/Phase 2</phase>

Corresponding tables are:

  • Study
  • Location
  • Sponsors
  • Eligibility
  • Dates
  • MeSH (Medical Subject Headings)


The corresponding nodes are:

<brief_title>Congenital Adrenal Hyperplasia: Calcium Channels as Therapeutic Targets</brief_title>
    <authority>United States: Federal Government</authority>
    <textblock> This study will test the ability of extended release nifedipine (Procardia XL), a blood pressure medication, to permit a decrease in the dose of glucocorticoid medication children take to treat congenital adrenal hyperplasia (CAH). </textblock>
    <textblock> This protocol is designed to assess both acute and chronic effects of the calcium channel antagonist, nifedipine, on the hypothalamic-pituitary-adrenal axis in patients with congenital adrenal hyperplasia. The multicenter trial is composed of two phases and will involve a double-blind, placebo-controlled parallel design. The goal of Phase I is to examine the ability of nifedipine vs. placebo to decrease adrenocorticotropic hormone (ACTH) levels, as well as to begin to assess the dose-dependency of nifedipine effects. The goal of Phase II is to evaluate the long-term effects of nifedipine; that is, can attenuation of ACTH release by nifedipine permit a decrease in the dosage of glucocorticoid needed to suppress the HPA axis? Such a decrease would, in turn, reduce the deleterious effects of glucocorticoid treatment in CAH. </textblock>
<phase>Phase 1/Phase 2</phase>
<study_design>Intervention Model: Parallel Assignment, Masking: Double-Blind, Primary Purpose: Treatment</study_design>
<condition>Congenital Adrenal Hyperplasia</condition>


The corresponding node is:

        <name>Medical University of South Carolina</name>
            <state>South Carolina</state>
            <country>United States</country>


The corresponding node is:

        <agency>National Center for Research Resources (NCRR)</agency>


The corresponding node is:

        <textblock> Inclusion Criteria: - diagnosed with Congenital Adrenal Hyperplasia (CAH) - normal ECG during baseline evaluation Exclusion Criteria: - history of liver disease, or elevated liver function tests - history of cardiovascular disease </textblock>
    <minimum_age>14 Years</minimum_age>
    <maximum_age>35 Years</maximum_age>


The corresponding nodes are:

<verification_date>January 2004</verification_date>
<lastchanged_date>June 23, 2005</lastchanged_date>
<firstreceived_date>November 3, 1999</firstreceived_date>


The corresponding nodes are:

    <mesh_term>Adrenal Hyperplasia, Congenital</mesh_term>
    <mesh_term>Adrenogenital Syndrome</mesh_term>
    <mesh_term>Adrenocortical Hyperfunction</mesh_term>

File locations

The files/code that I have worked on all exist in this folder: E:\McNair\Projects\FDA Trials\Jeemin_Project

Trials per zipcode: 
    output: Jeemin_Trials_per_zipcode_output.txt
General data ripping:
    output: Jeemin_FDATrial_as_key_data.ripping_output.txt


Table 1:
    row_headers1 =  ['nct_id', 'brief title', 'oversight authority', 'brief summary', 
    'detailed description', 'overall status', 'start date', 'completion date', 'phase',
    'study type', 'study design', 'condition', 'intervention type', 'intervention name', 
    'eligibility description','eligibility gender', 'eligibility min age', 
    'eligibility max age', 'verification date', 'lastchanged date', 'firstreceived date',
    'has expanded access']
Table 2: 
    row_headers2 = ['nct_id', 'sponsor agency', 'sponsor class', 'lead or collaborator']
Table 3:
    row_headers3 = ['nct_id', 'facility name', 'city', 'state', 'zip', 'country']
Table 4:
    row_headers4 = ['nct_id', 'MeSH term']
Table 5:
    row_headers5 = ['nct_id', 'keyword']