[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]]
2017-12-21: Last minute adjustments to the Moroccan Data. Continued working on [[Selenium Documentation]]. 2017-12-20: Working on Selenium Documentation. Wrote 2 demo files. Wiki Page is avaiable [http://www.edegan.com/wiki/Selenium_Documentation here]. Created 3 spreadsheets for the Moroccan data. 2017-12-19: Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to make linked in crawler compatible with the RDP. Removed some of the bells and whistles. 2017-12-18: Continued finding errors with the Demo Day Crawler analysis. Rewrote the parser to remove any search terms that were in the top 10000 most common English words according to Google. Finished uploading and submitting Moroccan data. 2017-12-15: Found errors with the Demo Day Crawler. Fixed scripts to download Moroccan Law Data. 2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data. 2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder. 2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error. 2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder. 2017-11-20: Continued running [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler. 2017-11-16: Continued running [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler. 2017-11-15: Continued running [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://www.edegan.com/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs. 2017-11-1314: Continued running [http://mcnair/www.bakerinstituteedegan.orgcom/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder TIGER Geocoder]. 2017-11-13: Built [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser].
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format.
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder Tiger Geocoder Page].
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under "Editing Users".
2017-10-25: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/PostGIS_Installation TigerCoder Installation].
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/PostGIS_Installation PostGIS Installation page].
2017-10-23: Finished Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Houston_Innovation_District Houston Innovation District Project].
2017-10-19: Continued work on Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Houston_Innovation_District Houston Innovation District Project].
2017-10-18: Continued work on Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Houston_Innovation_DistrictHouston Innovation District Project].
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.
2017-09-25: New task -- Create text file with company, description, and company type.
#[http://mcnairwww.bakerinstituteedegan.orgcom/wiki/VC_Database_Rebuild VC Database Rebuild]
#psql vcdb2
#table name, sdccompanybasecore2
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.
2017-09-12: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.
2017-09-11: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data].
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Crunchbase_Data here].
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Crunchbase_Data here].
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) here] under Section 4.
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.
</onlyinclude>
===Spring 2017===
2017-05-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitutework on HTML Parser.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]Uploaded all semester projects to git server.
2017-0104-1120: Continued making text files Finished the HTML Parser for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]. Downloaded pdfs Ran HTML parser on accelerator founders. Data is stored in the background for the [http:projects/accelerators/mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]LinkedIn Founder Data.
2017-0104-1219: Continued making text files for Made updates to the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]Wikipage. Downloaded pdfs in Ran LinkedIn Crawler on accelerator data. Working on an html parser for the background for results from the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government LinkedIn Crawler Project].
2017-0104-1318: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]accelerator data.
2017-0104-1817: Downloaded pdfs in the background for Worked on ways to get correct search results from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government LinkedIn_Crawler_(Python) LinkedIn Crawler Project]. Worked on an HTML Parser for the results from the LinkedIn Crawler.
2017-0104-1913: Downloaded pdfs in Worked on debugging the background logout procedure for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government LinkedIn_Crawler_(Python) LinkedIn Crawler Project]. Created parser Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectCrunchbase_2013_Snapshot CrunchBase Snapshot], completed creation of final data set(yay!). Began working on cohort parser.
2017-0104-2312: Worked Work on parser for cohort data of bugs with the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]. Preliminary code is written, working on debugging.
2017-0104-2411: Worked on parser for cohort data of the Completed functional [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectcrawler of LinkedIn Recruiter Pro]. Cohort data file created, debugging is almost complete. Will begin Basic search functions work on the google accelerator search soonand download profile information for a given person.
2017-0104-2510: Finished parser for cohort data Began writing functioning crawler of LinkedIn. 2017-04-06: Continued working on debugging and documenting the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]. Some data files still need proofreading as they are not Wrote a test program that logs in an acceptable format, searches for a query, navigates through search pages, and logs out. Began working on Google sitesearch projectRecruiter program can now login and search.
2017-0104-2605: Continued working Began work on Google sitesearch projectthe LinkedIn Crawler. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser]Researched on launching Python Virtual Environment.
2017-0104-3003: Optimized enclosing circle algorithm through memoizationFinished debugging points for the Enclosing Circle Algorithm. Developed script Added Command Line functionality to read addresses the Industry Classifier. 2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm. 2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from accelerator the data set(see above). 2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and return determined that translation to latitude and longitude coordinatesresulted in slightly off center circles.
2017-0103-3123: Built WayBack Machine CrawlerFinished debugging the brute force algorithm for [http://www. Updated documentation for coordinates scriptedegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Updated profile page Implemented a method to include locations plot the points and circles on a graph. Analyzed runtime of codethe brute force algorithm.
2017-0203-0121: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.Circle project for VC data will end up being Coded a joint project to join accelerator data. Pull descriptions brute force algorithm for VCthe [http://www. Founders of accelerators in linkedinedegan. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. Pull business registration data, Sterncom/wiki/Guzman Enclosing_Circle_Algorithm Enclosing Circle Algorithm. GIS ontop of geocoded data.Maps that works on wiki or blog (CartoDB), Maps API and R.NLP Projects, Description Classifier].
2017-0203-0220: Out sick, independent research and work from RDP. Brief research into Worked on debugging the [http://jorgeg.scriptswww.mitedegan.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.orgcom/wiki/interactive_maps Interactive MapsEnclosing_Circle_Algorithm Enclosing Circle Algorithm]. No helpful additions to map embedding problem.
2017-0203-0709: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities. Added descriptive statistics Finished script to cohort data excel filedraw Enclosing Circles on a Google Map.
2017-0203-08: Worked on Neural Net for the Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier ProjectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.
2017-0203-1307: Worked Redetermined the top 50 cities which Enclosing Circle should be run on Neural Net . Data on the [http://www.edegan.com/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for the VC Backed Companies can be found here.] Ran [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier ProjectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities.
2017-0203-1406: Worked on the application of Ran script to determine the top 50 cities which Enclosing Circle algorithm to should be run on. Fixed the VC study. Working on bug fixes Circles script to take in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]a new data format.
2017-03-02-15: Finished Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to Project. Began work on the VC study[http://www.edegan. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixescom/wiki/LinkedInCrawlerPython LinkedIn Crawler].
2017-0203-1601: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping Created statistics for the algorithm in C to improve speedVC Circles Project.
2017-02-2028: Continued to download Finished downloading geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier] Found bug in Enclosing Circle Algorithm.
2017-02-2127: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that Assisted work on the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [httpshttp://www.microsoftedegan.com/en-us/downloadwiki/details.aspx?id=44266 hereIndustry_Classifier Industry Classifier].
2017-02-2223: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstituteInstalled C++ Compiler for Python.org/wiki/Industry_Classifier Industry Classifier Project]Ran tests on difference between Python and C wrapped Python.
2017-02-2322: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for PythonHelped out with [http://www.edegan. Ran tests on difference between Python and C wrapped Pythoncom/wiki/Industry_Classifier Industry Classifier Project].
2017-02-2721: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [httphttps://mcnairwww.bakerinstitutemicrosoft.orgcom/wikien-us/download/Industry_Classifier Industry Classifierdetails.aspx?id=44266 here].
2017-02-2820: Finished downloading Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Found bug in Enclosing Circle AlgorithmAssisted work on the [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier].
2017-0302-0116: Created statistics for Reworked [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the VC Circles Projectalgorithm in C to improve speed.
2017-0302-0215: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See Finished [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Projectapplied to the VC study. Began work on Enclosing Circle algorithm still needs adjustment, but the program runs with the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler]temporary fixes.
2017-0302-0614: Ran script Worked on the application of the Enclosing Circle algorithm to determine the top 50 cities which VC study. Working on bug fixes in the Enclosing Circle should be run onalgorithm. Fixed Created wiki page for the VC Circles script to take in a new data format[http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].
2017-0302-0713: Redetermined the top 50 cities which Enclosing Circle should be run on. Data Worked on Neural Net for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.Industry_Classifier Industry Classifier Project] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.
2017-0302-08: Continued running Worked on Neural Net for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmIndustry_Classifier Industry Classifier Project] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.
2017-0302-0907: Continued running Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project] on the Top 50 Cities. Finished script Added descriptive statistics to draw Enclosing Circles on a Google Mapcohort data excel file.
2017-0302-2002: Worked on debugging Out sick, independent research and work from RDP. Brief research into the [http://mcnairjorgeg.scripts.bakerinstitutemit.orgedu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithminteractive_maps Interactive Maps]. No helpful additions to map embedding problem.
2017-0302-2101: Coded Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.Circle project for VC data will end up being a brute force algorithm joint project to join accelerator data. Pull descriptions for the [http:VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. Pull business registration data, Stern//mcnairGuzman Algorithm.bakerinstitute GIS ontop of geocoded data.org/Maps that works on wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]or blog (CartoDB), Maps API and R.NLP Projects, Description Classifier.
2017-0301-2331: Finished debugging the brute force algorithm Built WayBack Machine Crawler. Updated documentation for [http://mcnaircoordinates script.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method Updated profile page to plot the points and circles on a graph. Analyzed runtime include locations of the brute force algorithmcode.
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.
2017-0301-2830: Finished running the Enclosing Circle AlgorithmOptimized enclosing circle algorithm through memoization. Worked on removing incorrect points Developed script to read addresses from the accelerator data set(see above)and return latitude and longitude coordinates.
2017-0301-2926: Worked Continued working on debugging points Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for the Enclosing Circle Algorithmcohorts, priority 3, make internet archive wayback machine driver. Located [http://www.edegan.com/wiki/Whois_Parser Whois Parser].
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.
2017-0401-0525: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.2017-04-06: Continued working on debugging and documenting Finished parser for cohort data of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project]. Wrote a test program that logs Some data files still need proofreading as they are not in, searches for a query, navigates through search pages, and logs outan acceptable format. Recruiter program can now login and searchBegan working on Google sitesearch project.
2017-0401-1024: Began writing functioning crawler Worked on parser for cohort data of LinkedInthe [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.
2017-0401-1123: Completed functional Worked on parser for cohort data of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) crawler of LinkedIn Recruiter ProAccelerator Seed List project]. Basic search functions work and download profile information for a given personPreliminary code is written, working on debugging.
2017-0401-1219: Work on bugs with Downloaded pdfs in the background for the [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.
2017-0401-1318: Worked on debugging Downloaded pdfs in the logout procedure background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Moroccan_Parliament_Web_Crawler Moroccan Government Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase SnapshotProject].
2017-0401-1713: Worked on ways to get correct search results from Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project]. Worked on an HTML Parser Downloaded pdfs in the background for the results from the LinkedIn [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government CrawlerProject].
2017-0401-1812: Ran LinkedIn Continued making text files for the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler on matches between Crunchbase Snapshot and the accelerator dataProject].
2017-0401-1911: Made updates to Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser Downloaded pdfs in the background for the results from the LinkedIn [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government CrawlerProject].
2017-0401-2010: Finished the HTML Parser Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project]. Ran HTML parser on accelerator foundersDownloaded pdfs in the background for the [http://www. Data is stored in projectsedegan.com/acceleratorswiki/LinkedIn Founder DataMoroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.
===Fall 2016===
2016-0912-2608: Set up Staff Continued making text files for the [http://www.edegan.com/wiki page, work log page; registered /Accelerator_Seed_List_(Data) Accelerator Seed List project]. 2016-12-07: Continued making text files for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docsthe [http://www. Created edegan.com/wiki page for Moroccan Web Driver Project/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-0912-2906: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes Learned how to a link and opens up use git. Committed software projects from the print dialog box. Developed computational recipe for a different approach semester to the problemMcNair git repository. Projects can be found at; [http://www.edegan.com/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].
2016-0912-3002: Selenium program selects view pdf option from the website, Built and goes to the pdf webpageran web crawler for Center for Middle East Studies on Kuwait. Program then switches handle to Continued making text files for the new page[http://www. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog windowedegan. Looking into other libraries besides selenium that may helpcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
2016-1012-0301: Moroccan Web Driver projects completed Continued making text files for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites[http://www. Begun process of devising a naming system for the files that does not require scrapingedegan. Tinkered with naming through regular expression parsing of the URLcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Structure Built tool for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites[http://www.edegan. Fixed bug on McNair com/wiki for women's biz team where email was plain text instead of an email link/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Took Adds a glimpse at Kuwait Parliament website, and it appears to be very different from column of data that shows whether or not the Moroccan setupbill has been passed.
2016-1011-0629: Discussed with Dr. Elbadawy about the desired file names for Moroccan Began pulling data download. The consensus was that from the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implementaccelerators listed [http://www. The naming of files is currently drawing errors in going from arabic, to url, to download, to filenameedegan. Debugging in processcom/wiki/Accelerator_Seed_List_(Data) here]. Also built a demo selenium program Made text files for Dr. Egan that drives the McNair blog site on an infinite loopabout 18 accelerators.
2016-1011-0722: Learned unicode and utf8 encoding and decoding in arabicTransferred downloaded Morocco Written Bills to provided SeaGate Drive. Still working on transforming an ascii url into printable unicodeMade a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) here].
2016-1011-1118: Fixed arabic bug, Converted Executive Order PDFs to text files can now be saved with arabic titlesusing adobe acrobat DC. Monarchy bills downloaded and ready for shipmentSee [http://www. House of Representatives Bill mostly downloaded, ratified bills prepared for downloadedegan. Started learning scrapy library in python com/wiki/E%26I_Governance_Policy_Report Wikipage] for web scraping. Discussed idea of screenshot-ing questions instead of scrapingdetails.
2016-1011-1317: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or Wrote a webcrawler approach crawler to download the Moroccan oral and written questions data. Began building Web Crawler for Oral retrieve information about executive orders, and Written Questions site. Edited Moroccan Web Driver/Crawler wiki pagetheir corresponding pdfs. They can be found [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverE%26I_Governance_Policy_Report here.]Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
2016-1011-1415: Finished Oral Questions crawler. Finished download of Moroccan Written Questions crawlerQuestion pdfs. Waiting for further details on whether that data needs Wrote a parser with Christy to be tweaked used for parsing bills from Congress and eventually executive orders. Found bug in any way. Updated the Moroccan Web Driver/Web Crawler wiki pagesystem Python that was worked out and rebooted. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-1011-1811: Finished code for Oral Questions web driver and Written Questions web driver using seleniumContinued to download Moroccan data in the background. Now, the data Attempted to find bug fixes for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverAccelerator_Seed_List_(Data) Accelerator Project]crawler.
2016-11-10-20: Continued to download Moroccan data and Kuwait data for in the Moroccan Parliament Written and Oral Questionsbackground. Began work on [http://www. Updated Wiki pageedegan. Started working on Twitter project with Christycom/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverAccelerator_Seed_List_(Data) Accelerator Project]to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS.
2016-1011-2108: Continued to download Moroccan data in the background. Finished writing code for the Moroccan Parliament Written and Oral Questionsembedded files on the Kuwait Site. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] Spent time debugging the frame errors due to see how I can be helpful. Drthe dynamically generated content. Egan asked me Never found an answer to think about how the bug, and instead found a workaround that sacrificed run time for the ability to potentially make multiple tools to get cohorts and other sorts of data from accelerator siteswork. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Moroccan_Parliament_Web_Crawler Moroccan Web CrawlerDriver] for potential ideas on how to bring this project to fruition.
2016-11-0104: Continued to download Moroccan data in the background. Went over code for GovTracker Finished writing initial Kuwait Web Crawler, continued learning Perl/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Moroccan_Parliament_Web_Crawler Moroccan Web CrawlerDriver] Began Kuwait Web Crawler/Driver.
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department.
2016-11-0401: Continued to download Moroccan data in the background. Finished writing initial Kuwait Went over code for GovTracker Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website, continued learning Perl. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver].
2016-1110-0821: Continued to download Moroccan data in for the backgroundMoroccan Parliament Written and Oral Questions. Looked over [http://www. Finished writing code for the embedded files on the Kuwait Siteedegan. Spent time debugging the frame errors due com/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to the dynamically generated contentsee how I can be helpful. Dr. Never found an answer Egan asked me to think about how to the bug, potentially make multiple tools to get cohorts and instead found a workaround that sacrificed run time for other sorts of data from accelerator sites. See [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the ability to work. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Govtrack_Webcrawler_(Wiki_Page) GovTrack Web DriverCrawler]for potential ideas on how to bring this project to fruition.
2016-1110-1020: Continued to download data for the Moroccan data Parliament Written and Kuwait data in the backgroundOral Questions. Updated Wiki page. Began work Started working on [http://mcnairTwitter project with Christy.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator ProjectMoroccan_Parliament_Web_Crawler Moroccan Web Driver] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS.
2016-1110-1118: Continued to download Moroccan Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data in for the dates of questions can be found using the crawler, and the pdfs of the backgroundquestions will be downloaded using selenium. Attempted to find bug fixes for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator ProjectMoroccan_Parliament_Web_Crawler Moroccan Web Driver] crawler.
2016-1110-1514: Finished download of Moroccan Oral Questions crawler. Finished Written Question pdfsQuestions crawler. Wrote a parser with Christy Waiting for further details on whether that data needs to be used for parsing bills from Congress and eventually executive orderstweaked in any way. Found bug in Updated the system Python that was worked out and rebootedMoroccan Web Driver/Web Crawler wiki page. [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-1110-1713: Wrote Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a crawler webcrawler approach to retrieve information about executive orders, download the Moroccan oral and written questions data. Began building Web Crawler for Oral and their corresponding pdfsWritten Questions site. Edited Moroccan Web Driver/Crawler wiki page. They can be found [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report here.Moroccan_Parliament_Web_Crawler Moroccan Web Driver] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
2016-10-11-18: Converted Executive Order PDFs to text Fixed arabic bug, files using adobe acrobat DCcan now be saved with arabic titles. See [http://mcnairMonarchy bills downloaded and ready for shipment.bakerinstituteHouse of Representatives Bill mostly downloaded, ratified bills prepared for download.org/wiki/E%26I_Governance_Policy_Report Wikipage] Started learning scrapy library in python for detailsweb scraping. Discussed idea of screenshot-ing questions instead of scraping.
2016-1110-2207: Transferred downloaded Morocco Written Bills to provided SeaGate DriveLearned unicode and utf8 encoding and decoding in arabic. Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]Still working on transforming an ascii url into printable unicode.
2016-1110-2906: Began pulling Discussed with Dr. Elbadawy about the desired file names for Moroccan data from download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the accelerators listed [http://mcnairquestions data must be retrieved using a web crawler which I need to learn how to implement.bakerinstituteThe naming of files is currently drawing errors in going from arabic, to url, to download, to filename.org/wiki/Accelerator_Seed_List_(Data) here]Debugging in process. Made text files Also built a demo selenium program for about 18 acceleratorsDr. Egan that drives the McNair blog site on an infinite loop.
2016-1210-0103: Continued making text files Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the [http://mcnairRatified bills sites.bakerinstituteBegun process of devising a naming system for the files that does not require scraping.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]Tinkered with naming through regular expression parsing of the URL. Built tool Structure for the [http://mcnair.bakerinstituteoral questions and written questions drivers is set up, but need fixes due to the differences in the sites.org/Fixed bug on McNair wiki/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christyfor women's biz team where email was plain text instead of an email link. Adds Took a column of data that shows whether or not glimpse at Kuwait Parliament website, and it appears to be very different from the bill has been passedMoroccan setup.
2016-1209-0230: Built Selenium program selects view pdf option from the website, and ran web crawler for Center for Middle East Studies on Kuwaitgoes to the pdf webpage. Program then switches handle to the new page. Continued making text files for CTRL S is sent to the [http://mcnairpage to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue.bakerinstituteExplored Chrome Options for saving automatically without a dialog window.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]Looking into other libraries besides selenium that may help.
2016-1209-0629: Learned how Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to use gita link and opens up the print dialog box. Committed software projects from the semester Developed computational recipe for a different approach to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser]problem.
2016-1209-0726: Continued making text files Set up Staff wiki page, work log page; registered for the [http://mcnair.bakerinstituteSlack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs.org/Created wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. 2016-12-08: Continued making text files page for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]Moroccan Web Driver Project.
'''Notes'''