Changes

610 bytes added , 18:51, 20 May 2019

no edit summary

~~<font size~~=~~"5">'''~~==Fall ~~2016'''~~2017===<~~/font~~onlyinclude>

~~09/27/2016 15:00-18:00: Set up Staff wiki page, work~~ [[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page~~; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.~~)]]

~~09/29/2016 15:00~~2017-12-~~18:00~~21: ~~Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes~~ Last minute adjustments to ~~a link and opens up~~ the ~~print dialog box~~Moroccan Data. ~~Developed computational recipe for a different approach to the problem~~Continued working on [[Selenium Documentation]].

~~09/30/2016~~ 2017-12~~:00~~-~~14:00~~20: Working on Selenium ~~program selects view pdf option from the website, and goes to the pdf webpage~~Documentation. ~~Program then switches handle to the new page~~Wrote 2 demo files. ~~CTRL S~~ Wiki Page is ~~sent to the page to launch save dialog window~~avaiable [http://www. ~~Text cannot be sent to this window~~edegan. ~~Brainstorm ways around this issue~~com/wiki/Selenium_Documentation here]. ~~Explored Chrome Options~~ Created 3 spreadsheets for ~~saving automatically without a dialog window. Looking into other libraries besides selenium that may help~~the Moroccan data.

~~10/3/2016 13:00~~ 2017-12- ~~16:00~~19: ~~Moroccan Web Driver projects completed for driving of~~ Finished fixing the ~~Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites~~Demo Day Crawler. ~~Begun process of devising a naming system for the~~ Changed files ~~that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions~~ and ~~written questions drivers is set up, but need fixes due~~ installed as appropriate to ~~the differences~~ make linked in crawler compatible with the ~~sites~~RDP. ~~Fixed bug on McNair wiki for women's biz team where email was plain text instead~~ Removed some of ~~an email link. Took a glimpse at Kuwait Parliament website,~~ the bells and ~~it appears to be very different from the Moroccan setup~~whistles.

~~10/6/2016 13:30~~ 2017-12- 18:~~00: Discussed~~ Continued finding errors with Drthe Demo Day Crawler analysis. ~~Elbadawy about~~ Rewrote the ~~desired file names for Moroccan data download. The consensus was~~ parser to remove any search terms that were in the ~~bill programs are ready~~ top 10000 most common English words according to ~~launch once the files can be named properly,~~ Google. Finished uploading and ~~the questions~~ submitting Moroccan data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.

~~10/7/2016~~ 2017-12~~:00~~ - ~~14:00~~15: ~~Learned unicode and utf8 encoding and decoding in arabic~~Found errors with the Demo Day Crawler. ~~Still working on transforming an ascii url into printable unicode~~Fixed scripts to download Moroccan Law Data.

~~10/11/2016 15:00~~ 2017-12- ~~18:00~~14: ~~Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment~~Uploading Morocco Parliament Written Questions. ~~House of Representatives Bill mostly downloaded, ratified bills prepared~~ Creating script for next Morocco Parliament download. ~~Started learning scrapy library in python for web scraping~~Begin writing Selenium documentation. ~~Discussed idea of screenshot-ing questions instead of scraping~~Continuing to download TIGER data.

~~10/13/2016 13:00~~2017-12-1806:00: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the Moroccan oral and written questions data. Began building Web Crawler for Oral and Running Morocco Parliament Written Questions ~~site~~script. ~~Edited Moroccan Web Driver/~~Analyzing Demo Day Crawler ~~wiki page~~results. ~~[http://mcnair.bakerinstitute~~Continued downloading for TIGER geocoder.~~org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]~~

~~10/14/2016 12:00~~2017-11-1428:~~00: Finished Oral Questions crawler~~Debugging Morocco Parliament Crawler. ~~Finished Written Questions crawler~~Running Demo Day Crawler for all accelerators and 10 pages per accelerator. ~~Waiting for further details on whether that data needs~~ TIGER geocoder is back to ~~be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute~~Forbidden Error.~~org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]~~

~~10/18/2016 15:00~~2017-11-1827:~~30: Finished code for Oral Questions web driver~~ Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and ~~Written Questions web driver using selenium~~running it again. ~~Now, the data~~ Continued downloading for ~~the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute~~TIGER geocoder.~~org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]~~

~~10/~~2017-11-20~~/2016 13:00-18:00~~: Continued ~~to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy.~~ running [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Moroccan_Parliament_Web_Crawler Moroccan Web Driver~~Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.

~~10/21/2016 12:00~~2017-11-~~14:00~~16: Continued ~~to download data for the Moroccan Parliament Written and Oral Questions. Looked over~~ running [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler~~Demo_Day_Page_Parser Demo Day Page Parser] ~~to see how I can be helpful~~. DrFixed KeyTerms. ~~Egan asked me to think about how to potentially make multiple tools~~ py and trying to ~~get cohorts and other sorts of data from accelerator sites~~run it again. ~~See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at~~ Forbidden Error continues with the ~~[http://mcnair~~TIGER Geocoder.~~bakerinstitute~~Began Image download for Image Classification on cohort pages.~~org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler]~~ Clarifying specs for ~~potential ideas on how to bring this project to fruition~~Morocco Parliament crawler.

2017-11~~/1/2016:~~ -15:~~00-18~~Continued running [http:~~00: Continued~~ //www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to ~~download Moroccan data in the background~~extract counts that were greater than 2 from Keyword Matcher. ~~Went over code~~ Continued downloading for ~~GovTracker Web Crawler, continued learning Perl.~~ [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler~~Tiger_Geocoder TIGER Geocoder] ~~Began Kuwait Web Crawler/Driver~~. Finished re-formatting work logs.

2017-11~~/3/2016: 13:00~~-1814:00Continued running [http: ~~Continued~~ ///www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to ~~download Moroccan data in the background~~Text parser. See Parser Demo Day Page for file location. DrContinued downloading for [http://www. ~~Egan fixed systems requirements to run the GovTrack Web Crawler~~edegan. ~~Made significant progress on the Kuwait Web Crawler~~com/~~Driver for the Middle East Studies Department~~wiki/Tiger_Geocoder TIGER Geocoder].

2017-11~~/4/2016: 12:00~~-1413:00: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. Built [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Moroccan_Parliament_Web_Crawler Moroccan Web Driver~~Demo_Day_Page_Parser Demo Day Page Parser].

2017-11~~/8/2016: 15:00~~-1809:~~00: Continued to download Moroccan data in the background~~Running demo version of Demo Day crawler (Accelerator Google Crawler). Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to Fixing worklog format. ~~[http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]~~

2017-11~~/10/2016 13:00~~-1807:~~00: Continued to download Moroccan data~~ Created file with 0s and ~~Kuwait data in~~ 1s detailing whether crunchbase has the ~~background~~founder information for an accelerator. ~~Began work~~ Details posted as a TODO on [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Google_Scholar_Crawler Google Scholar Crawler~~Accelerator_Seed_List_(Data) Accelerator Seed List]page. ~~Wrote a crawler~~ Still waiting for feedback on the PostGIS installation from [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_(Data) Accelerator Project~~Tiger_Geocoder Tiger Geocoder] ~~to get the HTML files of hundreds of accelerators~~. ~~The crawler ended up failing; it appears to have been due to HTTPS~~Continued working on Accelerator Google Crawler.

2017-11~~/11/2016 12:00~~-206:00Contacted Geography Center for the US Census Bureau, [https: ~~Continued to download Moroccan data in the background~~//www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. ~~Attempted to find bug fixes for~~ Began working on the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_(Data)~~ Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator ~~Project] crawler~~Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.

2017-11~~/15/2016 15:00~~-1801:00Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http: ~~Finished download of Moroccan Written Question pdfs~~//www. ~~Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders~~edegan. ~~Found bug in the system Python that was worked out and rebooted~~com/wiki/Tiger_Geocoder Tiger Geocoder Page].

~~11/17/2016 13:00~~2017-10-1831:~~00: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found~~ Began downloading blocks of data for individual states for the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~E%26I_Governance_Policy_Report here~~Tiger_Geocoder Tiger Geocoder] project.~~] Next step is to run code to convert~~ Wrote out the ~~pdfs~~ new wiki page for installation, and beginning to ~~text files, then use the parser fixed by Christy~~write documentation on usage.

~~11/18/2016 12:00~~2017-10-~~2:00~~30: ~~Converted Executive Order PDFs~~ With Ed's help, was able to ~~text files using adobe acrobat DC~~get the national data from Tiger installed onto a database server. ~~See~~ The process required much jumping around and changing users, and all the things we learned are outlined in [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~E%26I_Governance_Policy_Report Wikipage~~Database_Server_Documentation#Editing_Users the database server documentation] ~~for details~~under "Editing Users".

~~11/22/2016 15:00~~2017-10-1825:~~00: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented~~ Continued working on the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_(Data) here~~PostGIS_Installation TigerCoder Installation].

~~11/29/2016 15:00~~2017-10-1824:~~00: Began pulling data from~~ Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the ~~accelerators listed~~ [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_(Data) here~~PostGIS_Installation PostGIS Installation page]~~. Made text files for about 18 accelerators~~.

~~12/1/2016 13:00~~2017-10-~~18:00~~23: ~~Continued making text files~~ Finished Yelp crawler for ~~the~~ [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&I Governance Report~~ Houston_Innovation_District Houston Innovation District Project] ~~with Christy. Adds a column of data that shows whether or not the bill has been passed~~.

~~12/2/2016 12:00~~2017-10-1419:~~00: Built and ran web~~ Continued work on Yelp crawler for ~~Center for Middle East Studies on Kuwait. Continued making text files for the~~ [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_(Data) Accelerator Seed List project~~Houston_Innovation_District Houston Innovation District Project].

~~12/6/2016 15:00~~2017-10-18:00: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], Continued work on Yelp crawler for [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_(Data) F6S Crawler and Parser~~Houston_Innovation_DistrictHouston Innovation District Project].

~~12/7/2016 15:00~~2017-10-1817:~~00: Continued making text files~~ Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the ~~[http://mcnair~~state of California.~~bakerinstitute~~Finished maps of Route 128.~~org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]~~Began working on selenium Yelp crawler to get cafe locations within the 610-loop.

~~12/8/2016 14:00~~2017-10-1816:~~00: Continued making text files for~~ Assisted Harrison on the ~~[http://mcnair~~USITC project.~~bakerinstitute~~Looked for natural language processing tools to extract complaintants and defendants along with their location from case files.~~org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]~~Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.

2017-10-13: Updated various project wiki pages.

~~<font size="5">'''Spring~~ 2017~~'''</font>~~-10-12: Continued work on Patent Thicket project, awaiting further project specs.

2017-10-05: Emergency ArcGIS creation for Agglomeration project.

1/2017-10~~/2017 14:30~~-~~17:15~~04: ~~Continued making text files~~ Emergency ArcGIS creation for ~~the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List~~ Agglomeration project~~]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]~~.

~~1/11/~~2017 -10~~:00~~-~~12:00: Continued making text files for the [http~~02:~~//mcnair.bakerinstitute~~Worked on ArcGIS data.~~org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background~~ See Harrison's Work Log for the ~~[http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]~~details.

~~1/12/~~2017 ~~14:30~~-~~17:45: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http~~09-28:~~//mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]~~Added collaborative editing feature to PyCharm.

~~1/17/~~2017 ~~14:30~~-~~17:15: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http~~09-27:~~//mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]~~Worked on big database file.

12017-09-25: New task -- Create text file with company, description, and company type.#[http://www.edegan.com/wiki/VC_Database_Rebuild VC Database Rebuild]#psql vcdb2#table name, sdccompanybasecore2#Combine with Crunchbasebulk #TODO: Write wiki on linkedin crawler, write wiki on creating accounts. 2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. 2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next. 2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup. 2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn. 2017-09-14: Continued implementing LinkedIn Crawler for profiles. 2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles. 2017-09-12: Continued working on the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic. 2017-09-11: Continued working on the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. 2017 10-09-06:00Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://www.edegan.com/wiki/Crunchbase_Data here]. 2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://www.edegan.com/wiki/Crunchbase_Data here]. 2017-08-1224:00Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs. 2017-08-23: ~~Downloaded pdfs~~ Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it. 2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the ~~background for~~ profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Moroccan_Parliament_Web_Crawler Moroccan Government~~ LinkedIn_Crawler_(Python) here] under Section 4. 2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler ~~Project]~~.Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.</onlyinclude>

~~1/19/~~===Spring 2017 14:30- 17:45: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.===

~~1/23/~~2017 ~~10:00~~-~~12:00~~05-01: ~~Worked~~ Continued work on ~~parser for cohort data of the [http://mcnair~~HTML Parser.~~bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging~~Uploaded all semester projects to git server.

~~1/24/~~2017 ~~14:30~~-1704-20:~~15: Worked on parser~~ Finished the HTML Parser for ~~cohort data of~~ the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_~~LinkedIn_Crawler_(~~Data~~Python) ~~Accelerator Seed List project~~LinkedIn Crawler]. ~~Cohort data file created, debugging is almost complete. Will begin work~~ Ran HTML parser on ~~the google~~ accelerator ~~search soon~~founders. Data is stored in projects/accelerators/LinkedIn Founder Data.

~~1/25/~~2017 ~~10:00~~-~~12:00~~04-19: ~~Finished parser for cohort data of~~ Made updates to the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Accelerator_Seed_List_~~LinkedIn_Crawler_(~~Data~~Python) ~~Accelerator Seed List project~~LinkedIn Crawler]Wikipage. ~~Some~~ Ran LinkedIn Crawler on accelerator data ~~files still need proofreading as they are not in an acceptable format~~. ~~Began working~~ Working on ~~Google sitesearch project~~an html parser for the results from the LinkedIn Crawler.

~~1/26/~~2017 ~~14:30~~-~~17:45~~04-18: ~~Continued working~~ Ran LinkedIn Crawler on ~~Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split~~ matches between Crunchbase Snapshot and the accelerator data ~~up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser]~~.

~~1/30/~~2017 ~~10:00~~-1204-17:00Worked on ways to get correct search results from the [http: ~~Optimized enclosing circle algorithm through memoization~~//www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. ~~Developed script to read addresses~~ Worked on an HTML Parser for the results from ~~accelerator data and return latitude and longitude coordinates~~the LinkedIn Crawler.

~~1/31/~~2017 ~~14:30~~-1704-13:15Worked on debugging the logout procedure for the [http: ~~Built WayBack Machine~~ //www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. ~~Updated documentation~~ Began formulation of process to search for ~~coordinates script~~founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://www. ~~Updated profile page to include locations of code~~edegan.com/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].

~~2/1/~~2017 ~~10:00~~-04-12:00Work on bugs with the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].

~~Notes from Session with Ed~~2017-04-11: ~~Project on US university patenting and entrepreneurship programs~~ Completed functional [http://www.edegan.com/wiki/LinkedIn_Crawler_(~~writing code to identify universities in assignees~~Python), crawler of LinkedIn Recruiter Pro]. Basic search ~~Wikipedia (XML then bulk~~ functions work and download~~), student pop, faculty pop, etc.Circle project~~ profile information for ~~VC data will end up being~~ a ~~joint project to join accelerator data~~given person. ~~Pull descriptions for VC. Founders~~ 2017-04-10: Began writing functioning crawler of ~~accelerators in linkedin.~~ LinkedIn ~~cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin.~~ ~~Pull business registration data, Stern/Guzman Algorithm.~~ ~~GIS ontop of geocoded data.Maps that works on wiki or blog (CartoDB), Maps API and R.NLP Projects, Description Classifier~~.

~~2/2/~~2017 ~~14:30~~-~~15:45~~04-06: ~~Out sick, independent research~~ Continued working on debugging and ~~work from RDP. Brief research into~~ documenting the [http://~~jorgeg~~www.~~scripts~~edegan.~~mit.edu~~com/~~homepage~~wiki/~~wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm~~LinkedIn_Crawler_(Python) LinkedIn Crawler]. ~~Research into [http://mcnair~~Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out.~~bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem~~Recruiter program can now login and search.

~~2/7/~~2017 ~~14:30~~-1704-05:~~15: Fixed bugs in parse_cohort_data.py,~~ Began work on the ~~script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]~~LinkedIn Crawler. ~~Added descriptive statistics to cohort data excel file~~Researched on launching Python Virtual Environment.

~~2/8/~~2017 ~~10:00~~-1204-03:~~00 Worked on Neural Net~~ Finished debugging points for the ~~[http://mcnair~~Enclosing Circle Algorithm.~~bakerinstitute.org/wiki/Industry_Classifier~~ Added Command Line functionality to the Industry Classifier ~~Project]~~.

~~2/13/~~2017 ~~10:00~~-1203-29:00 Worked on ~~Neural Net~~ debugging points for the ~~[http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project]~~Enclosing Circle Algorithm.

~~2/14/~~2017 ~~14:30~~-1703-28:~~15: Worked on the application of~~ Finished running the Enclosing Circle ~~algorithm to the VC study~~Algorithm. ~~Working~~ Worked on ~~bug fixes in~~ removing incorrect points from the ~~Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]~~data set(see above).

~~2/15/~~2017~~: 10:00~~-1203-27:~~00: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm~~ Worked on debugging the Enclosing Circle Algorithm~~] applied~~ . Implemented a way to ~~the VC study. Enclosing Circle algorithm still needs adjustment~~remove interior circles, ~~but the program runs with the temporary fixes~~and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.

~~2/16/~~2017 ~~14:30~~-1703-23:~~45: Reworked~~ Finished debugging the brute force algorithm for [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] . Implemented a method to ~~create~~ plot the points and circles on a ~~file~~ graph. Analyzed runtime of ~~geocoded data. Began work on wrapping~~ the brute force algorithm ~~in C to improve speed~~.

~~2/20/~~2017 ~~10:00~~-1203-21:~~00: Continued to download geocoded data~~ Coded a brute force algorithm for ~~VC Data as part of~~ the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm~~] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier~~].

~~2/21/~~2017 ~~14:30~~- 1703-20:~~15: Continued to download geocoded data for VC Data as part of~~ Worked on debugging the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].

~~2/22/~~2017 ~~10:00~~-~~12:00~~03-09: Continued ~~to download geocoded data for VC Data as part of the~~ running [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] ~~Project~~on the Top 50 Cities. ~~Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project]~~Finished script to draw Enclosing Circles on a Google Map.

~~2/23/~~2017 ~~14:30~~-~~17:45~~03-08: Continued ~~to download geocoded data for VC Data as part of the~~ running [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] ~~Project~~on the Top 50 Cities. ~~Installed C++ Compiler for Python. Ran tests~~ Created script to draw outcome of the Enclosing Circle Algorithm on ~~difference between Python and C wrapped Python~~Google Maps.

~~2/27/~~2017 ~~10:00~~-1203-07:~~00: Continued to download geocoded data for VC~~ Redetermined the top 50 cities which Enclosing Circle should be run on. Data ~~as part of~~ on the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Enclosing_Circle_Algorithm Enclosing Circle Algorithm~~Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] ~~Project. Assisted work on the~~ Ran [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Industry_Classifier Industry Classifier~~Enclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities.

~~2/28/~~2017 ~~14:30~~-~~17:15~~03-06: ~~Finished downloading geocoded data for VC Data as part of~~ Ran script to determine the ~~[http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm~~ top 50 cities which Enclosing Circle ~~Algorithm] Project~~should be run on. ~~Found bug~~ Fixed the VC Circles script to take in ~~Enclosing Circle Algorithm~~a new data format.

~~3/1/~~2017 ~~10:00~~-~~12:00~~03-02: ~~Created statistics~~ Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://www.edegan.com/wiki/LinkedInCrawlerPython LinkedIn Crawler].

~~3/2/~~2017 ~~14:30~~-1703-01:~~45: Cleaned up data~~ Created statistics for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].

~~3/6/~~2017 ~~10:00~~-1202-28:00Finished downloading geocoded data for VC Data as part of the [http: ~~Ran script to determine the top 50 cities which~~ //www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle ~~should be run on~~Algorithm] Project. ~~Fixed the VC Circles script to take~~ Found bug in ~~a new data format~~Enclosing Circle Algorithm.

~~3/7/~~2017 ~~14:30~~-1702-27:~~15: Redetermined the top 50 cities which Enclosing Circle should be run on.~~ Continued to download geocoded data for VC Data on as part of the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here~~Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.~~] Ran~~ Assisted work on the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Enclosing_Circle_Algorithm Enclosing Circle Algorithm~~Industry_Classifier Industry Classifier] ~~on the Top 50 Cities~~.

~~3/8/~~2017 ~~10:00~~-~~12:00~~02-23: Continued ~~running~~ to download geocoded data for VC Data as part of the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] ~~on the Top 50 Cities~~Project. Installed C++ Compiler for Python. ~~Created script to draw outcome of the Enclosing Circle Algorithm~~ Ran tests on ~~Google Maps~~difference between Python and C wrapped Python.

~~3/9/~~2017 ~~14:30~~-~~17:45~~02-22: Continued ~~running~~ to download geocoded data for VC Data as part of the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] ~~on the Top 50 Cities~~Project. Helped out with [http://www.edegan. ~~Finished script to draw Enclosing Circles on a Google Map~~com/wiki/Industry_Classifier Industry Classifier Project].

~~3/20/~~2017 ~~10:00~~-1202-21:~~00: Worked on debugging~~ Continued to download geocoded data for VC Data as part of the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].

~~3/21/~~2017 ~~14:30~~-1702-20:~~15: Coded a brute force algorithm~~ Continued to download geocoded data for VC Data as part of the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier].

~~3/23/~~2017 ~~14:30~~- ~~17:45~~02-16: ~~Finished debugging the brute force algorithm for~~ Reworked [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]~~. Implemented a method~~ to ~~plot the points and circles on~~ create a ~~graph~~file of geocoded data. ~~Analyzed runtime of~~ Began work on wrapping the ~~brute force~~ algorithmin C to improve speed.

~~3/27/~~2017 ~~10:00~~-1202-15:00Finished [http: ~~Worked on debugging the~~ //www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. ~~Implemented a way to remove interior circles~~Enclosing Circle algorithm still needs adjustment, ~~and determined that translation to latitude and longitude coordinates resulted in slightly off center circles~~but the program runs with the temporary fixes.

~~3/28/~~2017 -02-14:~~30- 17:15: Finished running~~ Worked on the application of the Enclosing Circle ~~Algorithm~~algorithm to the VC study. ~~Worked~~ Working on ~~removing incorrect points from~~ bug fixes in the ~~data set(see above)~~Enclosing Circle algorithm. Created wiki page for the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].

~~3/29/~~2017 ~~10:00~~-~~12:00~~02-13: Worked on ~~debugging points~~ Neural Net for the ~~Enclosing Circle Algorithm~~[http://www.edegan.com/wiki/Industry_Classifier Industry Classifier Project].

~~4/3/~~2017 ~~10:00~~-1202-08:~~00: Finished debugging points~~ Worked on Neural Net for the ~~Enclosing Circle Algorithm~~[http://www.edegan. ~~Added Command Line functionality to the~~ com/wiki/Industry_Classifier Industry ClassifierProject].

~~4/5/~~2017 ~~9:45~~-1102-07:45Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http: ~~Began work on the LinkedIn Crawler~~//www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. ~~Researched on launching Python Virtual Environment~~Added descriptive statistics to cohort data excel file.

~~4/6/~~2017 ~~14:00~~-1702-02:~~15: Continued working on debugging~~ Out sick, independent research and ~~documenting~~ work from RDP. Brief research into the [http://~~mcnair~~jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~LinkedIn_Crawler_(Python) LinkedIn Crawler~~interactive_maps Interactive Maps]. ~~Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search~~No helpful additions to map embedding problem.

~~4/10/~~2017 ~~10:00~~-1202-01:00Notes from Session with Ed: ~~Began~~ Project on US university patenting and entrepreneurship programs (writing ~~functioning crawler~~ code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.Circle project for VC data will end up being a joint project to join accelerator data. Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIncannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. Pull business registration data, Stern/Guzman Algorithm. GIS ontop of geocoded data.Maps that works on wiki or blog (CartoDB), Maps API and R.NLP Projects, Description Classifier.

~~4/11/~~2017 ~~14:30~~-1701-31:~~15: Completed functional [http://mcnair~~Built WayBack Machine Crawler.~~bakerinstitute~~Updated documentation for coordinates script.~~org/wiki/LinkedIn_Crawler_(Python) crawler~~ Updated profile page to include locations of ~~LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person~~code.

~~4/12/2017 10:00-12:00: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].~~

~~4/13/~~2017 ~~14:~~-01-30~~-17~~:~~45: Worked on debugging the logout procedure for the [http://mcnair~~Optimized enclosing circle algorithm through memoization.~~bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process~~ Developed script to ~~search for founders of startups using a combination of the LinkedIn Crawler with the~~ read addresses from accelerator data ~~resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot]~~and return latitude and longitude coordinates.

~~4/17/~~2017 ~~10:00~~-1201-26:~~00: Worked~~ Continued working on ~~ways~~ Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get ~~correct search results from the~~ web urls for cohorts, priority 3, make internet archive wayback machine driver. Located [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~LinkedIn_Crawler_(Python) LinkedIn Crawler~~Whois_Parser Whois Parser]~~. Worked on an HTML Parser for the results from the LinkedIn Crawler~~.

~~4/18/2017 14:30-17:15: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.~~

~~4/19/~~2017 ~~10:00~~-1201-25:~~00: Made updates to~~ Finished parser for cohort data of the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~LinkedIn_Crawler_~~Accelerator_Seed_List_(~~Python~~Data) ~~LinkedIn Crawler~~Accelerator Seed List project] ~~Wikipage~~. ~~Ran LinkedIn Crawler on accelerator~~ Some datafiles still need proofreading as they are not in an acceptable format. ~~Working~~ Began working on ~~an html parser for the results from the LinkedIn Crawler~~Google sitesearch project.

~~4/20/~~2017 ~~14:30~~-1701-24:~~45: Finished the HTML Parser~~ Worked on parser for cohort data of the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~LinkedIn_Crawler_~~Accelerator_Seed_List_(~~Python~~Data) ~~LinkedIn Crawler~~Accelerator Seed List project]. ~~Ran HTML parser~~ Cohort data file created, debugging is almost complete. Will begin work on the google accelerator ~~founders. Data is stored in projects/accelerators/LinkedIn Founder Data~~search soon.

~~5/1/~~2017 ~~13:00~~-1701-23:00Worked on parser for cohort data of the [http: ~~Continued work~~ //www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on ~~HTML Parser. Uploaded all semester projects to git server~~debugging.

~~<font size="5">'''Fall~~ 2017~~'''<~~-01-19: Downloaded pdfs in the background for the [http:/~~font>~~/www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.

~~8/21/~~2017 ~~14:00~~-~~17:00~~01-18: ~~Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so~~ Downloaded pdfs in the ~~virtual environment~~ background for the ~~project cannot be fired up~~[http://www. ~~Continued working on Ubuntu machine~~edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].

~~8/22/~~2017 ~~14:00~~-1601-13:00Continued making text files for the [http: ~~Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further~~//www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. ~~However, if entering LinkedIn through a Google search,~~ Downloaded pdfs in the ~~profile can still be viewed if~~ background for the ~~user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post~~ [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~LinkedIn_Crawler_(Python) here~~Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project] ~~under Section 4~~.

~~8/23/~~2017 ~~14:00~~-1501-12:30Continued making text files for the [http: ~~Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase~~//www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. ~~Spent~~ Downloaded pdfs in the ~~day navigating~~ background for the ~~crunchbasebulk database, and seeing what useful information was contained in it~~[http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].

~~8/24/~~2017 ~~14:30~~-1601-11:30Continued making text files for the [http: ~~Began using~~ //www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the ~~Crunchbase API to retrieve founder information~~ background for ~~accelerators~~the [http://www. ~~Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs~~edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].

~~9/5/~~2017 ~~14:00~~-1601-10:00Continued making text files for the [http: ~~Post Harvey~~//www. ~~Finished retrieving names from the Crunchbase API on founders~~edegan. ~~Next step is to query crunchbase bulk database to get linkedin urls~~com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. ~~For more information, see~~ Downloaded pdfs in the background for the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Crunchbase_Data here~~Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].

9/6/2017 14:00-15:30: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].

~~9/11/2017 14:00-17:00: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data].~~ ===Fall 2016===

9/2016-12~~/2017 14:00~~-~~16:00~~08: Continued ~~working on~~ making text files for the [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~LinkedIn_Crawler_~~Accelerator_Seed_List_(~~Python~~Data) ~~LinkedIn Crawler for~~ Accelerator ~~Founders Data~~Seed List project]~~. Added to the wiki on this topic~~.

~~9/13/2017 14:00~~2016-12-1507:30Continued making text files for the [http: ~~Implemented LinkedIn Crawler for main portion of profiles~~//www.edegan. ~~Began working on crawling Experience section of profiles~~com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].

92016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http:/14/~~2017 13~~www.edegan.com/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http:~~30-15~~//www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http:~~30: Continued implementing LinkedIn~~ //www.edegan.com/wiki/Accelerator_Seed_List_(Data) F6S Crawler ~~for profiles~~and Parser].

~~9/18/2017 14:00~~2016-12-1702:~~00: Finished implementation of Experience Crawler, continued working~~ Built and ran web crawler for Center for Middle East Studies on ~~Education Crawler~~ Kuwait. Continued making text files for ~~LinkedIn~~the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].

92016-12-01: Continued making text files for the [http://19www.edegan.com/~~2017 14~~wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http:~~30-16:30: Began running linkedin crawler~~//www.edegan. ~~Helped Yang create RDP account, get permissions, and get~~ com/wiki ~~setup~~/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.

~~9/20/2017 14:00~~2016-11-1529:30Began pulling data from the accelerators listed [http: ~~Finished running linkedin crawler~~//www. ~~Transferred data to RDP~~edegan.com/wiki/Accelerator_Seed_List_(Data) here]. ~~Will write wikis next~~Made text files for about 18 accelerators.

~~#TODO~~2016-11-22: ~~Write wiki on linkedin~~ Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a "gentle" F6S crawler~~, write~~ to retrieve HTMLs of possible accelerator pages documented [http://www.edegan.com/wiki ~~on creating accounts~~/Accelerator_Seed_List_(Data) here].

~~9/21/2017 14:00~~2016-11-1618:00Converted Executive Order PDFs to text files using adobe acrobat DC. See [http: ~~Wrote~~ //www.edegan.com/wiki ~~on Linkedin crawler, met with Laura about patents project~~/E%26I_Governance_Policy_Report Wikipage] for details.

~~9/25/2017 14:00~~2016-11-17:~~00: New task -- Create text file with company, description~~Wrote a crawler to retrieve information about executive orders, and ~~company type~~their corresponding pdfs.#They can be found [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~VC_Database_Rebuild VC Database Rebuild~~E%26I_Governance_Policy_Report here.]~~#psql vcdb2#table name~~Next step is to run code to convert the pdfs to text files, ~~sdccompanybasecore2#Combine with Crunchbasebulk~~then use the parser fixed by Christy.

~~9/27/2017 14:00~~2016-11-~~16:00~~15: ~~Worked on big database file~~Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.

~~9/28/2017 13:30~~2016-11-1511:30Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http: ~~Added collaborative editing feature to PyCharm~~//www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.

2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on [http:/2/~~2017 14~~www.edegan.com/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http:~~00-17:00: Worked on ArcGIS data~~//www.edegan. ~~See Harrison's Work Log for~~ com/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the ~~details~~HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS.

~~10/4/2017 14:00~~2016-11-1608:00Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http: ~~Emergency ArcGIS creation for Agglomeration project~~//www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]

~~10/5/2017 14:15~~2016-11-1504:45Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http: ~~Emergency ArcGIS creation for Agglomeration project~~//www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]

~~10/12/2017 14:00~~2016-11-~~15:30~~03: Continued ~~work~~ to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on ~~Patent Thicket project, awaiting further project specs~~the Kuwait Web Crawler/Driver for the Middle East Studies Department.

~~10/13/2017 14:00~~2016-11-1501:00Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http: ~~Updated various project~~ //www.edegan.com/wiki ~~pages~~/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.

2016-10~~/16/2017 14:00~~-1721:~~00: Assisted Harrison on~~ Continued to download data for the ~~USITC project~~Moroccan Parliament Written and Oral Questions. Looked ~~for natural language processing~~ over [http://www.edegan.com/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to ~~extract complaintants~~ get cohorts and ~~defendants along with their location~~ other sorts of data from ~~case files~~accelerator sites. See [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://www.edegan. ~~Experimented with pulling based~~ com/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on ~~parts of speech tags, as well as using geotext or geograpy~~ how to bring this project to ~~pull locations from a case segment~~fruition.

2016-10~~/17/2017: 15:00~~-1720:~~00: Constructed ArcGIS maps for the agglomeration project. Finished maps of points~~ Continued to download data for ~~every year in~~ the ~~state of California~~Moroccan Parliament Written and Oral Questions. ~~Finished maps of Route 128~~Updated Wiki page. ~~Began~~ Started working on ~~selenium Yelp crawler to get cafe locations within the 610-loop~~Twitter project with Christy. [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]

2016-10/-18~~/2017 14~~:~~15-15:45: Continued work on Yelp~~ Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler ~~for~~ , and the pdfs of the questions will be downloaded using selenium. [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Houston_Innovation_DistrictHouston Innovation District Project~~Moroccan_Parliament_Web_Crawler Moroccan Web Driver].

2016-10~~/19/2017~~ -14:~~30-16:30: Continued work on Yelp~~ Finished Oral Questions crawler. Finished Written Questions crawler . Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Houston_Innovation_District Houston Innovation District Project~~Moroccan_Parliament_Web_Crawler Moroccan Web Driver].

2016-10~~/23/2017 14:00~~-1713:~~00: Finished Yelp crawler~~ Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://~~mcnair~~www.~~bakerinstitute~~edegan.~~org~~com/wiki/~~Houston_Innovation_District Houston Innovation District Project~~Moroccan_Parliament_Web_Crawler Moroccan Web Driver].

2016-10~~/24/2017 14:00~~-~~16:00~~11: ~~Throw some addresses into a database~~Fixed arabic bug, ~~use address normalizer~~ files can now be saved with arabic titles. Monarchy bills downloaded and ~~geocoder~~ready for shipment. ~~May need to install things. Details on the installation process can be found on the [http://mcnair~~House of Representatives Bill mostly downloaded, ratified bills prepared for download.~~bakerinstitute~~Started learning scrapy library in python for web scraping.~~org/wiki/PostGIS_Installation PostGIS Installation page]~~Discussed idea of screenshot-ing questions instead of scraping.

2016-10~~/25/2017 14:15~~-1507:~~45: Continued~~ Learned unicode and utf8 encoding and decoding in arabic. Still working on ~~the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation]~~transforming an ascii url into printable unicode.

2016-10~~/30/2017 14:00~~-1606:~~00: With Ed's help,~~ Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was ~~able~~ that the bill programs are ready to ~~get~~ launch once the files can be named properly, and the ~~national~~ questions data ~~from Tiger installed onto~~ must be retrieved using a ~~database server~~web crawler which I need to learn how to implement. The ~~process required much jumping around and changing users~~naming of files is currently drawing errors in going from arabic, to url, to download, ~~and all the things we learned are outlined~~ to filename. Debugging in ~~[http://mcnair~~process.~~bakerinstitute~~Also built a demo selenium program for Dr.~~org/wiki/Database_Server_Documentation#Editing_Users~~ Egan that drives the ~~database server documentation] under "Editing Users"~~McNair blog site on an infinite loop.

~~Starting~~ 2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to ~~use non-military timecodes because I want~~ the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears tobe very different from the Moroccan setup.

~~10/31/2017 2pm~~2016-~~4pm~~09-30: ~~Began downloading blocks of data for individual states for~~ Selenium program selects view pdf option from the website, and goes to the ~~[http://mcnair~~pdf webpage.~~bakerinstitute~~Program then switches handle to the new page.~~org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out~~ CTRL S is sent to the ~~new wiki~~ page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for ~~installation, and beginning to write documentation on usage~~saving automatically without a dialog window. Looking into other libraries besides selenium that may help.

~~11/1/2017 2~~2016-309-29:~~30pm: Attempted to continue downloading~~Re-enroll in Microsoft Remote Desktop with proper authentication, ~~however ran into HTTP Forbidden errors. Listed the errors~~ set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the ~~[http://mcnair~~print dialog box.~~bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page]~~Developed computational recipe for a different approach to the problem.

~~11/6/2017 2~~2016-~~5pm~~09-26: ~~Contacted Geography Center~~ Set up Staff wiki page, work log page; registered for ~~the US Census Bureau~~Slack, ~~[https://www.census.gov/geo/about/contact.html here]~~Microsoft Remote Desktop; downloaded Selenium on personal computer, ~~and began email exchange on PostGIS installation problems~~read Selenium docs. ~~Began working on the [http://mcnair.bakerinstitute.org/~~Created wiki~~/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days~~ page for ~~cohort companies~~Moroccan Web Driver Project.

11/7/2017 2-4pm: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.=='''Notes=='''

*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,613

edits

Changes

Peter Jalbert (Work Log) (view source)

Revision as of 18:51, 20 May 2019

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools