Changes

Jump to navigation Jump to search
1,136 bytes removed ,  18:51, 20 May 2019
no edit summary
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]]
2017-12-21: Last minute adjustments to the Moroccan Data. Continued working on [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation [Selenium Documentation]].
2017-12-20: Working on Selenium Documentation. Wrote 2 demo files. Wiki Page is avaiable [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Selenium_Documentation here]. Created 3 spreadsheets for the Moroccan data.
2017-12-19: Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to make linked in crawler compatible with the RDP. Removed some of the bells and whistles.
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.
2017-11-20: Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.
2017-11-16: Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.
2017-11-15: Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.
2017-11-14: Continued running [http://mcnair/www.bakerinstituteedegan.orgcom/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder TIGER Geocoder].
2017-11-13: Built [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Demo_Day_Page_Parser Demo Day Page Parser].
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format.
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder Tiger Geocoder Page].
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under "Editing Users".
2017-10-25: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/PostGIS_Installation TigerCoder Installation].
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/PostGIS_Installation PostGIS Installation page].
2017-10-23: Finished Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Houston_Innovation_District Houston Innovation District Project].
2017-10-19: Continued work on Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Houston_Innovation_District Houston Innovation District Project].
2017-10-18: Continued work on Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Houston_Innovation_DistrictHouston Innovation District Project].
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.
2017-09-25: New task -- Create text file with company, description, and company type.
#[http://mcnairwww.bakerinstituteedegan.orgcom/wiki/VC_Database_Rebuild VC Database Rebuild]
#psql vcdb2
#table name, sdccompanybasecore2
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.
2017-09-12: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.
2017-09-11: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data].
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Crunchbase_Data here].
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Crunchbase_Data here].
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) here] under Section 4.
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.
2017-04-20: Finished the HTML Parser for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.
2017-04-19: Made updates to the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.
2017-04-17: Worked on ways to get correct search results from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.
2017-04-13: Worked on debugging the logout procedure for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].
2017-04-12: Work on bugs with the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].
2017-04-11: Completed functional [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person.
2017-04-10: Began writing functioning crawler of LinkedIn.
2017-04-06: Continued working on debugging and documenting the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.
2017-03-23: Finished debugging the brute force algorithm for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.
2017-03-21: Coded a brute force algorithm for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].
2017-03-20: Worked on debugging the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].
2017-03-09: Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.
2017-03-08: Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedInCrawlerPython LinkedIn Crawler].
2017-03-01: Created statistics for the VC Circles Project.
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Found bug in Enclosing Circle Algorithm.
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier].
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier Project].
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier].
2017-02-16: Reworked [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.
2017-02-15: Finished [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].
2017-02-13: Worked on Neural Net for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier Project].
2017-02-08: Worked on Neural Net for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier Project].
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Whois_Parser Whois Parser].
2017-01-25: Finished parser for cohort data of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.
2017-01-24: Worked on parser for cohort data of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.
2017-01-23: Worked on parser for cohort data of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.
2017-01-19: Downloaded pdfs in the background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.
2017-01-18: Downloaded pdfs in the background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-01-13: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-01-12: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-01-11: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-01-10: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
===Fall 2016===
2016-12-08: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12-07: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-12-01: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.
2016-11-29: Began pulling data from the accelerators listed [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) here].
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report Wikipage] for details.
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS.
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department.
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping.

Navigation menu