Changes

Jump to navigation Jump to search
188 bytes removed ,  17:56, 14 November 2017
no edit summary
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.
2016-11-01: 15:00-18:00: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.
2016-11-03: 13:00-18:00: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department.
2016-11-04: 12:00-14:00: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-11/8/2016: 15:00-18:0008: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-11/-10/2016 13:00-18:00: Continued to download Moroccan data and Kuwait data in the background. Began work on [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS.
2016-11/-11/2016 12:00-2:00: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.
2016-11/15/2016 -15:00-18:00: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.
2016-11/-17/2016 13:00-18:00: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
2016-11/-18/2016 12:00-2:00: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.
2016-11/-22/2016 15:00-18:00: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].
2016-11/-29/2016 15:00-18:00: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.
2016-12/1/2016 13:00-18:0001: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.
12/2/2016 -12:00-14:0002: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12/6/2016 15:00-18:0006: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].
2016-12/7/2016 15:00-18:0007: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12/8/2016 14:00-18:0008: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
'''Notes'''

Navigation menu