Changes

Jump to navigation Jump to search
no edit summary
===Fall 2016===
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS.
===Fall 2016===-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-0911-2604: Set up Staff wiki page, work log page; registered Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docsadditional embedded files in the Kuwait website. [http://mcnair.bakerinstitute. Created org/wiki page for /Moroccan_Parliament_Web_Crawler Moroccan Web Driver Project.]
2016-0911-2903: Re-enroll Continued to download Moroccan data in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes the background. Dr. Egan fixed systems requirements to a link and opens up run the print dialog boxGovTrack Web Crawler. Developed computational recipe Made significant progress on the Kuwait Web Crawler/Driver for a different approach to the problemMiddle East Studies Department.
2016-0911-3001: Selenium program selects view pdf option from the website, and goes Continued to download Moroccan data in the pdf webpagebackground. Program then switches handle to the new pageWent over code for GovTracker Web Crawler, continued learning Perl. CTRL S is sent to the page to launch save dialog window[http://mcnair. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog windowbakerinstitute. Looking into other libraries besides selenium that may helporg/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.
2016-10-0321: Moroccan Web Driver projects completed Continued to download data for driving of the Monarchy proposed bills, the House Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of Representatives proposed bills, and the Ratified bills data from accelerator sites. Begun process of devising a naming system for the files that does not require scrapingSee [http://mcnair. Tinkered with naming through regular expression parsing of the URLbakerinstitute. Structure for the oral questions and written questions drivers is set up, but need fixes due org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the differences in the sites[http://mcnair.bakerinstitute. Fixed bug on McNair org/wiki /Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears potential ideas on how to bring this project to be very different from the Moroccan setupfruition.
2016-10-0620: Discussed with Dr. Elbadawy about Continued to download data for the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, Parliament Written and the questions data must be retrieved using a web crawler which I need to learn how to implementOral Questions. The naming of files is currently drawing errors in going from arabic, to url, to download, to filenameUpdated Wiki page. Debugging in processStarted working on Twitter project with Christy. Also built a demo selenium program for Dr[http://mcnair. Egan that drives the McNair blog site on an infinite loopbakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-0718: Learned unicode Finished code for Oral Questions web driver and utf8 encoding Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and decoding in arabicthe pdfs of the questions will be downloaded using selenium. [http://mcnair. Still working on transforming an ascii url into printable unicodebakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-1114: Fixed arabic bug, files can now be saved with arabic titlesFinished Oral Questions crawler. Finished Written Questions crawler. Monarchy bills downloaded and ready Waiting for shipmentfurther details on whether that data needs to be tweaked in any way. House of Representatives Bill mostly downloaded, ratified bills prepared for downloadUpdated the Moroccan Web Driver/Web Crawler wiki page. Started learning scrapy library in python for web scraping[http://mcnair. Discussed idea of screenshot-ing questions instead of scrapingbakerinstitute. org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-1411: Finished Oral Questions crawlerFixed arabic bug, files can now be saved with arabic titles. Finished Written Questions crawlerMonarchy bills downloaded and ready for shipment. Waiting House of Representatives Bill mostly downloaded, ratified bills prepared for further details on whether that data needs to be tweaked download. Started learning scrapy library in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnairpython for web scraping.bakerinstituteDiscussed idea of screenshot-ing questions instead of scraping.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-1807: Finished code for Oral Questions web driver Learned unicode and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, utf8 encoding and the pdfs of the questions will be downloaded using selenium. [http://mcnairdecoding in arabic.bakerinstituteStill working on transforming an ascii url into printable unicode.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-2006: Continued Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to download data for launch once the Moroccan Parliament Written files can be named properly, and Oral Questionsthe questions data must be retrieved using a web crawler which I need to learn how to implement. Updated Wiki pageThe naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Started working on Twitter project with ChristyDebugging in process. [http://mcnairAlso built a demo selenium program for Dr.bakerinstituteEgan that drives the McNair blog site on an infinite loop.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-2103: Continued to download data Moroccan Web Driver projects completed for driving of the Moroccan Parliament Written Monarchy proposed bills, the House of Representatives proposed bills, and Oral Questionsthe Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Looked over [http://mcnairTinkered with naming through regular expression parsing of the URL.bakerinstituteStructure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites.org/Fixed bug on McNair wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christyfor women's Twitter Crawler] biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data very different from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruitionMoroccan setup.
2016-1109-0130: Continued Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to download Moroccan data in the backgroundpage to launch save dialog window. Went over code for GovTracker Web Crawler, continued learning PerlText cannot be sent to this window. [http://mcnairBrainstorm ways around this issue.bakerinstituteExplored Chrome Options for saving automatically without a dialog window.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/DriverLooking into other libraries besides selenium that may help.
2016-1109-0329: Continued to download Moroccan data Re-enroll in the background. Dr. Egan fixed systems requirements Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to run a link and opens up the GovTrack Web Crawlerprint dialog box. Made significant progress on the Kuwait Web Crawler/Driver Developed computational recipe for a different approach to the Middle East Studies Departmentproblem.
2016-1109-0426: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/Set up Staff wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver] 2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bugpage, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver] 2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failinglog page; it appears to have been due to HTTPS.  2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes registered for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler. 2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted. 2016-11-17: Wrote a crawler to retrieve information about executive ordersSlack, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.  2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details. 2016-11-22: Transferred Microsoft Remote Desktop; downloaded Morocco Written Bills to provided SeaGate Drive. Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. 2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators. 2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed. 2016-12-02: Built and ran web crawler for Center for Middle East Studies Selenium on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. 2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler]personal computer, [http://mcnairread Selenium docs.bakerinstitute.org/Created wiki/Moroccan_Parliament_Web_Crawler Foreign Government page for Moroccan Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser]. 2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. 2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]Driver Project.
'''Notes'''

Navigation menu