Difference between revisions of "Trial Data Project"

From edegan.com
Jump to navigation Jump to search
 
(18 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
{{Project
 +
|Has project output=Data,Tool
 +
|Has sponsor=McNair Center
 +
|Has title=Trial Data Project
 +
|Has owner=Jeemin Sim, Catherine Kirby,
 +
|Has project status=Complete
 +
}}
 +
==Summary==
 +
 +
This project works out how to reprocess the Clinical Trial Data from ClinicalTrials.gov into structured and cleaned datasets. The data covers 239,638 studies from 2000 to present.
 +
 +
== Information Source ==
 +
* https://clinicaltrials.gov/ct2/resources/download
 +
* https://clinicaltrials.gov/ct2/html/images/info/public.xsd
 +
 
== Steps Followed to Extract the Trial Data ==
 
== Steps Followed to Extract the Trial Data ==
  
Line 108: Line 123:
 
The corresponding nodes are:
 
The corresponding nodes are:
 
  <verification_date>January 2004</verification_date>
 
  <verification_date>January 2004</verification_date>
 +
 
  <lastchanged_date>June 23, 2005</lastchanged_date>
 
  <lastchanged_date>June 23, 2005</lastchanged_date>
 +
 
  <firstreceived_date>November 3, 1999</firstreceived_date>
 
  <firstreceived_date>November 3, 1999</firstreceived_date>
 
  
 
==== MeSH ====
 
==== MeSH ====
Line 121: Line 137:
 
     <mesh_term>Adrenocortical Hyperfunction</mesh_term>
 
     <mesh_term>Adrenocortical Hyperfunction</mesh_term>
 
  </condition_browse>
 
  </condition_browse>
 +
 
  -<intervention_browse>
 
  -<intervention_browse>
 
     <mesh_term>Nifedipine</mesh_term>
 
     <mesh_term>Nifedipine</mesh_term>
 
  </intervention_browse>
 
  </intervention_browse>
 +
 +
== File locations ==
 +
The files/code that I have worked on all exist in this folder: E:\McNair\Projects\FDA Trials\Jeemin_Project
 +
Trials per zipcode:
 +
    code: Jeemin_Trials_per_zipcode.py
 +
    output: Jeemin_Trials_per_zipcode_output.txt
 +
General data ripping:
 +
    code: Jeemin_FDATrial_as_key_data_ripping.py
 +
    output: Jeemin_FDATrial_as_key_data.ripping_output.txt
 +
 +
== Tables ==
 +
Table 1:
 +
    row_headers1 =  ['nct_id', 'brief title', 'oversight authority', 'brief summary',
 +
    'detailed description', 'overall status', 'start date', 'completion date', 'phase',
 +
    'study type', 'study design', 'condition', 'intervention type', 'intervention name',
 +
    'eligibility description','eligibility gender', 'eligibility min age',
 +
    'eligibility max age', 'verification date', 'lastchanged date', 'firstreceived date',
 +
    'has expanded access']
 +
 +
Table 2:
 +
    row_headers2 = ['nct_id', 'sponsor agency', 'sponsor class', 'lead or collaborator']
 +
 +
Table 3:
 +
    row_headers3 = ['nct_id', 'facility name', 'city', 'state', 'zip', 'country']
 +
 +
Table 4:
 +
    row_headers4 = ['nct_id', 'MeSH term']
 +
 +
Table 5:
 +
    row_headers5 = ['nct_id', 'keyword']

Latest revision as of 13:44, 21 September 2020


Project
Trial Data Project
Project logo 02.png
Project Information
Has title Trial Data Project
Has owner Jeemin Sim, Catherine Kirby
Has start date
Has deadline date
Has project status Complete
Has sponsor McNair Center
Has project output Data, Tool
Copyright © 2019 edegan.com. All Rights Reserved.

Summary

This project works out how to reprocess the Clinical Trial Data from ClinicalTrials.gov into structured and cleaned datasets. The data covers 239,638 studies from 2000 to present.

Information Source

Steps Followed to Extract the Trial Data

Extracting Data from XML Files

All the historical USPTO data is available as XML files. Here is the tree structure for the XML files:

-<clinical_study rank="61205">
    +<required_header>
    +<id_info>
    <brief_title>
    +<sponsors>
    <source>
    +<oversight_info>
    +<brief_summary>
    +<detailed_description>
    <overall_status>Completed</overall_status>
    <phase>Phase 1/Phase 2</phase>
    <study_type>
    <study_design>
    <condition>
    +<intervention>
    +<eligibility>
    +<location>
    +<location_countries>
    <verification_date>
    <lastchanged_date>
    <firstreceived_date>
    <has_expanded_access>
    +<condition_browse>
    +<intervention_browse>


Corresponding tables are:

  • Study
  • Location
  • Sponsors
  • Eligibility
  • Dates
  • MeSH (Medical Subject Headings)

Study

The corresponding nodes are:

-<id_info>
    <org_study_id>NCRR-M01RR01070-0506</org_study_id>
    <secondary_id>M01RR001070</secondary_id>
    <nct_id>NCT00000102</nct_id>
</id_info>
<brief_title>Congenital Adrenal Hyperplasia: Calcium Channels as Therapeutic Targets</brief_title>
-<oversight_info>
    <authority>United States: Federal Government</authority>
</oversight_info>
-<brief_summary>
    <textblock> This study will test the ability of extended release nifedipine (Procardia XL), a blood pressure medication, to permit a decrease in the dose of glucocorticoid medication children take to treat congenital adrenal hyperplasia (CAH). </textblock>
</brief_summary>
-<detailed_description>
    <textblock> This protocol is designed to assess both acute and chronic effects of the calcium channel antagonist, nifedipine, on the hypothalamic-pituitary-adrenal axis in patients with congenital adrenal hyperplasia. The multicenter trial is composed of two phases and will involve a double-blind, placebo-controlled parallel design. The goal of Phase I is to examine the ability of nifedipine vs. placebo to decrease adrenocorticotropic hormone (ACTH) levels, as well as to begin to assess the dose-dependency of nifedipine effects. The goal of Phase II is to evaluate the long-term effects of nifedipine; that is, can attenuation of ACTH release by nifedipine permit a decrease in the dosage of glucocorticoid needed to suppress the HPA axis? Such a decrease would, in turn, reduce the deleterious effects of glucocorticoid treatment in CAH. </textblock>
</detailed_description>
<overall_status>Completed</overall_status>
<phase>Phase 1/Phase 2</phase>
<study_type>Interventional</study_type>
<study_design>Intervention Model: Parallel Assignment, Masking: Double-Blind, Primary Purpose: Treatment</study_design>
<condition>Congenital Adrenal Hyperplasia</condition>

Location

The corresponding node is:

-<location>
    -<facility>
        <name>Medical University of South Carolina</name>
        -<address>
            <city>Charleston</city>
            <state>South Carolina</state>
            <country>United States</country>
        </address>
    </facility>
</location>

Sponsors

The corresponding node is:

-<sponsors>
    -<lead_sponsor>
        <agency>National Center for Research Resources (NCRR)</agency>
        <agency_class>NIH</agency_class>
    </lead_sponsor>
</sponsors>

Eligibility

The corresponding node is:

-<eligibility>
    -<criteria>
        <textblock> Inclusion Criteria: - diagnosed with Congenital Adrenal Hyperplasia (CAH) - normal ECG during baseline evaluation Exclusion Criteria: - history of liver disease, or elevated liver function tests - history of cardiovascular disease </textblock>
    </criteria>
    <gender>Both</gender>
    <minimum_age>14 Years</minimum_age>
    <maximum_age>35 Years</maximum_age>
    <healthy_volunteers>No</healthy_volunteers>
</eligibility>

Dates

The corresponding nodes are:

<verification_date>January 2004</verification_date>
<lastchanged_date>June 23, 2005</lastchanged_date>
<firstreceived_date>November 3, 1999</firstreceived_date>

MeSH

The corresponding nodes are:

-<condition_browse>
    <mesh_term>Hyperplasia</mesh_term>
    <mesh_term>Adrenal Hyperplasia, Congenital</mesh_term>
    <mesh_term>Adrenogenital Syndrome</mesh_term>
    <mesh_term>Adrenocortical Hyperfunction</mesh_term>
</condition_browse>
-<intervention_browse>
    <mesh_term>Nifedipine</mesh_term>
</intervention_browse>

File locations

The files/code that I have worked on all exist in this folder: E:\McNair\Projects\FDA Trials\Jeemin_Project

Trials per zipcode: 
    code: Jeemin_Trials_per_zipcode.py
    output: Jeemin_Trials_per_zipcode_output.txt
General data ripping:
    code: Jeemin_FDATrial_as_key_data_ripping.py
    output: Jeemin_FDATrial_as_key_data.ripping_output.txt

Tables

Table 1:
    row_headers1 =  ['nct_id', 'brief title', 'oversight authority', 'brief summary', 
    'detailed description', 'overall status', 'start date', 'completion date', 'phase',
    'study type', 'study design', 'condition', 'intervention type', 'intervention name', 
    'eligibility description','eligibility gender', 'eligibility min age', 
    'eligibility max age', 'verification date', 'lastchanged date', 'firstreceived date',
    'has expanded access']
Table 2: 
    row_headers2 = ['nct_id', 'sponsor agency', 'sponsor class', 'lead or collaborator']
Table 3:
    row_headers3 = ['nct_id', 'facility name', 'city', 'state', 'zip', 'country']
Table 4:
    row_headers4 = ['nct_id', 'MeSH term']
Table 5:
    row_headers5 = ['nct_id', 'keyword']