Changes

Jump to navigation Jump to search
no edit summary
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]]
2017-1112-0721: Created file with 0s and 1s detailing whether crunchbase has Last minute adjustments to the founder information for an acceleratorMoroccan Data. Details posted as a TODO Continued working on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List[Selenium Documentation] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.
2017-1112-0620: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contactWorking on Selenium Documentation.html here], and began email exchange on PostGIS installation problemsWrote 2 demo files. Began working on the Wiki Page is avaiable [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Selenium_Documentation Selenium Documentationhere]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days Created 3 spreadsheets for cohort companiesthe Moroccan data.
2017-1112-0119: Attempted Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to continue downloading, however ran into HTTP Forbidden errorsmake linked in crawler compatible with the RDP. Listed Removed some of the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page]bells and whistles.
2017-1012-3118: Began downloading blocks of data for individual states for Continued finding errors with the [http://mcnairDemo Day Crawler analysis.bakerinstituteRewrote the parser to remove any search terms that were in the top 10000 most common English words according to Google.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, Finished uploading and beginning to write documentation on usagesubmitting Moroccan data.
2017-1012-3015: With Ed's help, was able to get Found errors with the national data from Tiger installed onto a database serverDemo Day Crawler. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under "Editing Users"Fixed scripts to download Moroccan Law Data.
2017-1012-2514: Continued working on the [http://mcnairUploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download.bakerinstituteBegin writing Selenium documentation.org/wiki/PostGIS_Installation TigerCoder Installation]Continuing to download TIGER data.
2017-1012-2406: Throw some addresses into a database, use address normalizer and geocoderRunning Morocco Parliament Written Questions script. May need to install thingsAnalyzing Demo Day Crawler results. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page]Continued downloading for TIGER geocoder.
2017-1011-2328: Finished Yelp crawler Debugging Morocco Parliament Crawler. Running Demo Day Crawler for [http://mcnair.bakerinstituteall accelerators and 10 pages per accelerator.org/wiki/Houston_Innovation_District Houston Innovation District Project]TIGER geocoder is back to Forbidden Error.
2017-1011-1927: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued work on Yelp crawler downloading for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project]TIGER geocoder.
2017-11-20: Continued running [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler. 2017-11-16: Continued running [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler. 2017-11-15: Continued running [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://www.edegan.com/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs. 2017-11-14: Continued running [http:///www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://www.edegan.com/wiki/Tiger_Geocoder TIGER Geocoder]. 2017-11-13: Built [http://www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. 2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format.  2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://www.edegan.com/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler. 2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://www.edegan.com/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies. 2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://www.edegan.com/wiki/Tiger_Geocoder Tiger Geocoder Page]. 2017-10-31: Began downloading blocks of data for individual states for the [http://www.edegan.com/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage. 2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://www.edegan.com/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under "Editing Users". 2017-10-25: Continued working on the [http://www.edegan.com/wiki/PostGIS_Installation TigerCoder Installation]. 2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://www.edegan.com/wiki/PostGIS_Installation PostGIS Installation page]. 2017-10-23: Finished Yelp crawler for [http://www.edegan.com/wiki/Houston_Innovation_District Houston Innovation District Project]. 2017-10-19: Continued work on Yelp crawler for [http://www.edegan.com/wiki/Houston_Innovation_District Houston Innovation District Project]. 2017-10-18: Continued work on Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Houston_Innovation_DistrictHouston Innovation District Project].
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.
2017-09-25: New task -- Create text file with company, description, and company type.
#[http://mcnairwww.bakerinstituteedegan.orgcom/wiki/VC_Database_Rebuild VC Database Rebuild]
#psql vcdb2
#table name, sdccompanybasecore2
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.
2017-09-12: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.
2017-09-11: Continued working on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data].
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Crunchbase_Data here].
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Crunchbase_Data here].
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) here] under Section 4.
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.
</onlyinclude>
===Spring 2017===
1/10/2017 14:30-17:1505-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitutework on HTML Parser.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]Uploaded all semester projects to git server.
1/11/2017 10:00-1204-20:00: Continued making text files Finished the HTML Parser for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]. Downloaded pdfs Ran HTML parser on accelerator founders. Data is stored in the background for the [http:projects/accelerators/mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]LinkedIn Founder Data.
1/12/2017 14:30-1704-19:45: Continued making text files for Made updates to the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]Wikipage. Downloaded pdfs in Ran LinkedIn Crawler on accelerator data. Working on an html parser for the background for results from the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government LinkedIn Crawler Project].
1/17/2017 14:30-1704-18:15: Continued making text files for Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]accelerator data.
1/18/2017 10:00-1204-17:00: Downloaded pdfs in the background for Worked on ways to get correct search results from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government LinkedIn_Crawler_(Python) LinkedIn Crawler Project]. Worked on an HTML Parser for the results from the LinkedIn Crawler.
1/19/2017 14:30- 1704-13:45: Downloaded pdfs in Worked on debugging the background logout procedure for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government LinkedIn_Crawler_(Python) LinkedIn Crawler Project]. Created parser Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectCrunchbase_2013_Snapshot CrunchBase Snapshot], completed creation of final data set(yay!). Began working on cohort parser.
1/23/2017 10:00-04-12:00: Worked Work on parser for cohort data of bugs with the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]. Preliminary code is written, working on debugging.
1/24/2017 14:30-1704-11:15: Worked on parser for cohort data of the Completed functional [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectcrawler of LinkedIn Recruiter Pro]. Cohort data file created, debugging is almost complete. Will begin Basic search functions work on the google accelerator search soonand download profile information for a given person.
1/25/2017 -04-10:00-12:00: Finished parser for cohort data Began writing functioning crawler of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch projectLinkedIn.
1/26/2017 14:30-17:4504-06: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located debugging and documenting the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Whois_Parser Whois ParserLinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.
1/30/2017 10:00-12:0004-05: Optimized enclosing circle algorithm through memoizationBegan work on the LinkedIn Crawler. Developed script to read addresses from accelerator data and return latitude and longitude coordinatesResearched on launching Python Virtual Environment.
1/31/2017 14:30-17:1504-03: Built WayBack Machine Crawler. Updated documentation Finished debugging points for coordinates scriptthe Enclosing Circle Algorithm. Updated profile page Added Command Line functionality to include locations of codethe Industry Classifier.
2/1/2017 10:00-12:0003-29:Worked on debugging points for the Enclosing Circle Algorithm.
Notes from Session with Ed2017-03-28: Project Finished running the Enclosing Circle Algorithm. Worked on US university patenting and entrepreneurship programs removing incorrect points from the data set(writing code to identify universities in assigneessee above), search Wikipedia (XML then bulk download), student pop, faculty pop, etc2017-03-27: Worked on debugging the Enclosing Circle project for VC data will end up being Algorithm. Implemented a joint project way to remove interior circles, and determined that translation to join accelerator datalatitude and longitude coordinates resulted in slightly off center circles. Pull descriptions 2017-03-23: Finished debugging the brute force algorithm for VC[http://www. Founders of accelerators in linkedinedegan. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. Pull business registration data, Sterncom/wiki/Guzman Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. GIS ontop of geocoded data.Maps that works Implemented a method to plot the points and circles on wiki or blog (CartoDB), Maps API and Ra graph.NLP Projects, Description ClassifierAnalyzed runtime of the brute force algorithm.
2/2/2017 14:30-15:45: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman 21: Coded a brute force algorithm]. Research into for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/interactive_maps Interactive MapsEnclosing_Circle_Algorithm Enclosing Circle Algorithm]. No helpful additions to map embedding problem.
2/7/2017 14:30-1703-20:15: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from Worked on debugging the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]. Added descriptive statistics to cohort data excel file.
2/8/2017 10:00-1203-09:00 Worked on Neural Net for the Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier ProjectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.
2/13/2017 10:00-1203-08:00 Worked on Neural Net for the Continued running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier ProjectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.
2/14/2017 14:30-1703-07:15: Worked on the application of Redetermined the top 50 cities which Enclosing Circle algorithm to the VC studyshould be run on. Working Data on bug fixes in the Enclosing Circle algorithm[http://www.edegan. Created com/wiki page /Top_Cities_for_VC_Backed_Companies Top 50 Cities for the VC Backed Companies can be found here.] Ran [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities.
2/15/2017: 10:00-12:00: Finished [http03-06://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied Ran script to determine the VC study. top 50 cities which Enclosing Circle algorithm still needs adjustment, but should be run on. Fixed the program runs with the temporary fixesVC Circles script to take in a new data format.
2/16/2017 14:30-1703-02:45: Reworked Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded dataProject. Began work on wrapping the algorithm in C to improve speed[http://www.edegan.com/wiki/LinkedInCrawlerPython LinkedIn Crawler].
2/20/2017 10:00-12:0003-01: Continued to download geocoded data Created statistics for the VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Circles Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].
2/21/2017 14:30- 1702-28:15: Continued to download Finished downloading geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Found bug in Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].
2/22/2017 10:00-12:0002-27: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with Assisted work on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier Project].
2/23/2017 14:30-17:4502-23: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.
2/27/2017 10:00-12:0002-22: Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the Helped out with [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry ClassifierProject].
2/28/2017 14:30-1702-21:15: Finished downloading Continued to download geocoded data for VC Data as part of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Found bug in Researched into C++ Compilers for Python so that the Enclosing Circle Algorithmcould be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].
3/1/2017 10:00-1202-20:00: Created statistics Continued to download geocoded data for VC Data as part of the VC Circles [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier].
3/2/2017 14:30-1702-16:45: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See Reworked [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Projectto create a file of geocoded data. Began work on wrapping the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler]algorithm in C to improve speed.
3/6/2017 10:00-1202-15:00Finished [http: Ran script //www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to determine the top 50 cities which VC study. Enclosing Circle should be run on. Fixed algorithm still needs adjustment, but the program runs with the VC Circles script to take in a new data formattemporary fixes.
3/7/2017 -02-14:30-17:15: Redetermined Worked on the application of the top 50 cities which Enclosing Circle should be run onalgorithm to the VC study. Data Working on bug fixes in the [http://mcnairEnclosing Circle algorithm.bakerinstitute.org/Created wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities page for VC Backed Companies can be found here.] Ran the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.
3/8/2017 10:00-1202-13:00: Continued running Worked on Neural Net for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmIndustry_Classifier Industry Classifier Project] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.
3/9/2017 14:30-1702-08:45: Continued running Worked on Neural Net for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmIndustry_Classifier Industry Classifier Project] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.
3/20/2017 10:00-1202-07:00: Worked on debugging Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.
3/21/2017 14:30-1702-02:15Out sick, independent research and work from RDP. Brief research into the [http: Coded a brute force //jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm for the ]. Research into [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithminteractive_maps Interactive Maps]. No helpful additions to map embedding problem.
3/23/2017 14:30- 1702-01:45Notes from Session with Ed: Finished debugging the brute force algorithm Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.Circle project for [http://mcnairVC data will end up being a joint project to join accelerator data. Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot).bakerinstituteCan eventually get academic backgrounds through linkedin.org/wikiPull business registration data, Stern/Enclosing_Circle_Algorithm Enclosing Circle Guzman Algorithm]. Implemented a method to plot the points GIS ontop of geocoded data.Maps that works on wiki or blog (CartoDB), Maps API and circles on a graphR. Analyzed runtime of the brute force algorithmNLP Projects, Description Classifier.
3/27/2017 10:00-1201-31:00: Worked on debugging the Enclosing Circle AlgorithmBuilt WayBack Machine Crawler. Updated documentation for coordinates script. Implemented a way Updated profile page to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circlesinclude locations of code.
3/28/2017 14:30- 17:15: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above).
3/29/2017 10:00-12:0001-30: Worked on debugging points for the Enclosing Circle AlgorithmOptimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.
4/3/2017 10:00-1201-26:00Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located [http: Finished debugging points for the Enclosing Circle Algorithm//www.edegan. Added Command Line functionality to the Industry Classifiercom/wiki/Whois_Parser Whois Parser].
4/5/2017 9:45-11:45: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.
4/6/2017 14:00-1701-25:15: Continued working on debugging and documenting Finished parser for cohort data of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project]. Wrote a test program that logs Some data files still need proofreading as they are not in, searches for a query, navigates through search pages, and logs outan acceptable format. Recruiter program can now login and searchBegan working on Google sitesearch project.
4/10/2017 10:00-1201-24:00Worked on parser for cohort data of the [http: Began writing functioning crawler of LinkedIn//www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.
4/11/2017 14:30-1701-23:15: Completed functional Worked on parser for cohort data of the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) crawler of LinkedIn Recruiter ProAccelerator Seed List project]. Basic search functions work and download profile information for a given personPreliminary code is written, working on debugging.
4/12/2017 10:00-1201-19:00Downloaded pdfs in the background for the [http: Work on bugs with //www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.
4/13/2017 14:30-1701-18:45: Worked on debugging Downloaded pdfs in the logout procedure background for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_(Python) LinkedIn Moroccan_Parliament_Web_Crawler Moroccan Government Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase SnapshotProject].
4/17/2017 10:00-1201-13:00: Worked on ways to get correct search results from Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project]. Worked on an HTML Parser Downloaded pdfs in the background for the results from the LinkedIn [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government CrawlerProject].
4/18/2017 14:30-1701-12:15Continued making text files for the [http: Ran LinkedIn //www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler on matches between Crunchbase Snapshot and the accelerator dataProject].
4/19/2017 10:00-1201-11:00: Made updates to Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser Downloaded pdfs in the background for the results from the LinkedIn [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government CrawlerProject].
4/20/2017 14:30-1701-10:45: Finished the HTML Parser Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedIn_Crawler_Accelerator_Seed_List_(PythonData) LinkedIn CrawlerAccelerator Seed List project]. Ran HTML parser on accelerator foundersDownloaded pdfs in the background for the [http://www.edegan. Data is stored in projectscom/acceleratorswiki/LinkedIn Founder DataMoroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
5/1/2017 13:00-17:00: Continued work on HTML Parser. Uploaded all semester projects to git server.
===Fall 2016===
09/27/2016 15:00-1812-08:00Continued making text files for the [http: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs//www.edegan. Created com/wiki page for Moroccan Web Driver Project/Accelerator_Seed_List_(Data) Accelerator Seed List project].
09/29/2016 15:00-1812-07:00Continued making text files for the [http: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box//www.edegan. Developed computational recipe for a different approach to the problemcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].
09/30/2016 -12:00-1406:00: Selenium program selects view pdf option Learned how to use git. Committed software projects from the website, and goes semester to the pdf webpageMcNair git repository. Program then switches handle to the new pageProjects can be found at; [http://www.edegan. CTRL S is sent to the page to launch save dialog windowcom/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://www. Text cannot be sent to this windowedegan. Brainstorm ways around this issuecom/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://www. Explored Chrome Options for saving automatically without a dialog windowedegan. Looking into other libraries besides selenium that may helpcom/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].
10/3/2016 13:00 - 1612-02:00: Moroccan Web Driver projects completed Built and ran web crawler for Center for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sitesMiddle East Studies on Kuwait. Begun process of devising a naming system Continued making text files for the files that does not require scraping[http://www. Tinkered with naming through regular expression parsing of the URLedegan. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair com/wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup/Accelerator_Seed_List_(Data) Accelerator Seed List project].
10/6/2016 13:30 - 1812-01:00Continued making text files for the [http: Discussed with Dr//www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Elbadawy about the desired file names Built tool for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement[http://www. The naming of files is currently drawing errors in going from arabic, to url, to download, to filenameedegan. Debugging in processcom/wiki/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Also built Adds a demo selenium program for Dr. Egan column of data that drives shows whether or not the McNair blog site on an infinite loopbill has been passed.
10/7/2016 12:00 - 1411-29:00Began pulling data from the accelerators listed [http: Learned unicode and utf8 encoding and decoding in arabic//www.edegan.com/wiki/Accelerator_Seed_List_(Data) here]. Still working on transforming an ascii url into printable unicodeMade text files for about 18 accelerators.
10/2016-11/2016 15:00 - 18:0022: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills Transferred downloaded and ready for shipmentMorocco Written Bills to provided SeaGate Drive. House Made a "gentle" F6S crawler to retrieve HTMLs of Representatives Bill mostly downloaded, ratified bills prepared for downloadpossible accelerator pages documented [http://www. Started learning scrapy library in python for web scrapingedegan. Discussed idea of screenshot-ing questions instead of scrapingcom/wiki/Accelerator_Seed_List_(Data) here].
10/13/2016 13:00-11-18:00: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach Converted Executive Order PDFs to download the Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki pagetext files using adobe acrobat DC. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverE%26I_Governance_Policy_Report Wikipage]for details.
10/14/2016 12:00-1411-17:00: Finished Oral Questions Wrote a crawlerto retrieve information about executive orders, and their corresponding pdfs. Finished Written Questions crawler. Waiting for further details on whether that data needs to They can be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. found [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverE%26I_Governance_Policy_Report here.]Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
10/18/2016 -11-15:00-18:30: Finished code download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for Oral Questions web driver parsing bills from Congress and Written Questions web driver using seleniumeventually executive orders. Now, the data for the dates of questions can be found using Found bug in the crawler, system Python that was worked out and the pdfs of the questions will be downloaded using selenium. [http://mcnairrebooted.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
10/20/2016 13:00-18:0011-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverAccelerator_Seed_List_(Data) Accelerator Project]crawler.
10/21/2016 12:00-14:0011-10: Continued to download Moroccan data and Kuwait data for in the Moroccan Parliament Written and Oral Questionsbackground. Looked over Began work on [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Google_Scholar_Crawler Google Scholar Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See Wrote a crawler for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator ListProject] He also asked me to look at get the [http://mcnairHTML files of hundreds of accelerators.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how The crawler ended up failing; it appears to bring this project have been due to fruitionHTTPS.
2016-11/1/2016: 15:00-18:0008: Continued to download Moroccan data in the background. Went over Finished writing code for GovTracker Web Crawlerthe embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, continued learning Perland instead found a workaround that sacrificed run time for the ability to work. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Moroccan_Parliament_Web_Crawler Moroccan Web CrawlerDriver] Began Kuwait Web Crawler/Driver.
2016-11/3/2016: 13:00-18:0004: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-11/4/2016: 12:00-14:0003: Continued to download Moroccan data in the background. Finished writing initial Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
2016-11/8/2016: 15:00-18:0001: Continued to download Moroccan data in the background. Finished writing Went over code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bugGovTracker Web Crawler, and instead found a workaround that sacrificed run time for the ability to workcontinued learning Perl. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver].
11/2016-10/2016 13:00-18:0021: Continued to download data for the Moroccan data Parliament Written and Kuwait data in the backgroundOral Questions. Began work on Looked over [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Google_Scholar_Crawler Google Scholar Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler]to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. Wrote a crawler for the See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator ProjectList] He also asked me to get look at the HTML files of hundreds of accelerators[http://www.edegan. The crawler ended up failing; it appears com/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to have been due bring this project to HTTPSfruition.
11/11/2016 12:00-2:0010-20: Continued to download Moroccan data in for the backgroundMoroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. Attempted to find bug fixes for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator ProjectMoroccan_Parliament_Web_Crawler Moroccan Web Driver] crawler.
11/15/2016 15:00-10-18:00: Finished download of Moroccan code for Oral Questions web driver and Written Question pdfsQuestions web driver using selenium. Wrote a parser with Christy to Now, the data for the dates of questions can be used for parsing bills from Congress found using the crawler, and eventually executive ordersthe pdfs of the questions will be downloaded using selenium. Found bug in the system Python that was worked out and rebooted[http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
11/17/2016 13:00-1810-14:00: Wrote a Finished Oral Questions crawler. Finished Written Questions crawler . Waiting for further details on whether that data needs to retrieve information about executive orders, and their corresponding pdfsbe tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. They can be found [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report here.Moroccan_Parliament_Web_Crawler Moroccan Web Driver] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
11/18/2016 12:00-210-13:00: Converted Executive Order PDFs Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to text files using adobe acrobat DCdownload the Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report WikipageMoroccan_Parliament_Web_Crawler Moroccan Web Driver] for details.
11/22/2016 15:00-1810-11:00: Transferred Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded Morocco Written Bills to provided SeaGate Driveand ready for shipment. Made a "gentle" F6S crawler to retrieve HTMLs House of possible accelerator pages documented [http://mcnairRepresentatives Bill mostly downloaded, ratified bills prepared for download.bakerinstituteStarted learning scrapy library in python for web scraping.org/wiki/Accelerator_Seed_List_(Data) here]Discussed idea of screenshot-ing questions instead of scraping.
11/29/2016 15:00-18:00: Began pulling data from the accelerators listed [http10-07://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]Learned unicode and utf8 encoding and decoding in arabic. Made text files for about 18 acceleratorsStill working on transforming an ascii url into printable unicode.
12/1/2016 13:00-1810-06:00: Continued making text Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files for can be named properly, and the [http://mcnairquestions data must be retrieved using a web crawler which I need to learn how to implement.bakerinstituteThe naming of files is currently drawing errors in going from arabic, to url, to download, to filename.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]Debugging in process. Built tool Also built a demo selenium program for the [http://mcnairDr.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&I Governance Report Project] with Christy. Adds a column of data Egan that shows whether or not drives the bill has been passedMcNair blog site on an infinite loop.
12/2/2016 12:00-1410-03:00: Built Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and ran web crawler the Ratified bills sites. Begun process of devising a naming system for Center for Middle East Studies on Kuwaitthe files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Continued making text files Structure for the [http://mcnairoral questions and written questions drivers is set up, but need fixes due to the differences in the sites.bakerinstituteFixed bug on McNair wiki for women's biz team where email was plain text instead of an email link.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.
12/6/2016 15:00-1809-30:00: Learned how Selenium program selects view pdf option from the website, and goes to use gitthe pdf webpage. Committed software projects from Program then switches handle to the semester new page. CTRL S is sent to the McNair git repositorypage to launch save dialog window. Projects can Text cannot be found at; [http://mcnairsent to this window.bakerinstituteBrainstorm ways around this issue.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnairExplored Chrome Options for saving automatically without a dialog window.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser]Looking into other libraries besides selenium that may help.
12/7/2016 15-09-29:00Re-18:00: Continued making text files enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]problem.
12/8/2016 14:00-1809-26:00: Continued making text files Set up Staff wiki page, work log page; registered for the [http://mcnairSlack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs.bakerinstitute.org/Created wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]page for Moroccan Web Driver Project.
'''Notes'''

Navigation menu