Changes

Jump to navigation Jump to search
no edit summary
===Spring 2017===
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server. 2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data. 2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler. 2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data. 2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler. 2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot]. 2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. 2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person.  2017-04-10: Began writing functioning crawler of LinkedIn.  2017-04-06: Continued making text files for working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_LinkedIn_Crawler_(DataPython) Accelerator Seed List projectLinkedIn Crawler]. Downloaded pdfs Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search. 2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment. 2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier. 2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm. 2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above).  2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles. 2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the background brute force algorithm. 2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler ProjectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]. 2017-03-20: Worked on debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].
2017-0103-1109: Continued making text files for the running [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]. Downloaded pdfs in the background for on the [http://mcnair.bakerinstituteTop 50 Cities.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]Finished script to draw Enclosing Circles on a Google Map.
2017-0103-1208: Continued making text files for the running [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities. Downloaded pdfs in Created script to draw outcome of the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]Enclosing Circle Algorithm on Google Maps.
2017-0103-1307: Continued making text files for Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectTop_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.]. Downloaded pdfs in the background for the Ran [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler ProjectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]on the Top 50 Cities.
2017-0103-1806: Downloaded pdfs in Ran script to determine the background for top 50 cities which Enclosing Circle should be run on. Fixed the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]VC Circles script to take in a new data format.
2017-0103-1902: Downloaded pdfs in the background Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project]. Created parser for Began work on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectLinkedInCrawlerPython LinkedIn Crawler], completed creation of final data set(yay!). Began working on cohort parser.
2017-03-01-23: Worked on parser Created statistics for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debuggingVC Circles Project.
2017-0102-2428: Worked on parser Finished downloading geocoded data for cohort data VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]Project. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon Found bug in Enclosing Circle Algorithm.
2017-0102-2527: Finished parser Continued to download geocoded data for cohort data VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]Project. Some data files still need proofreading as they are not in an acceptable formatAssisted work on the [http://mcnair.bakerinstitute. Began working on Google sitesearch projectorg/wiki/Industry_Classifier Industry Classifier].
2017-0102-2623: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator to download geocoded data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois ParserEnclosing_Circle_Algorithm Enclosing Circle Algorithm]Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.
2017-0102-3022: Optimized enclosing circle algorithm through memoization. Developed script Continued to read addresses from accelerator download geocoded data and return latitude and longitude coordinatesfor VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].
2017-0102-3121: Built WayBack Machine CrawlerContinued to download geocoded data for VC Data as part of the [http://mcnair. Updated documentation bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for coordinates scriptPython so that the Enclosing Circle Algorithm could be wrapped in C. Updated profile page to include locations of codeFound a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].
2017-02-0120: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code Continued to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.Circle project for VC geocoded data will end up being a joint project to join accelerator data. Pull descriptions for VC. Founders Data as part of accelerators in linkedinthe [http://mcnair. LinkedIn cannot be caught(pretend to not be a bot)bakerinstitute. Can eventually get academic backgrounds through linkedinorg/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Pull business registration data, SternAssisted work on the [http://Guzman Algorithmmcnair. GIS ontop of geocoded databakerinstitute.Maps that works on org/wiki or blog (CartoDB), Maps API and R.NLP Projects, Description /Industry_Classifier Industry Classifier].
2017-02-0216: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into Reworked [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive MapsEnclosing_Circle_Algorithm Enclosing Circle Algorithm]to create a file of geocoded data. No helpful additions Began work on wrapping the algorithm in C to map embedding problemimprove speed.
2017-02-0715: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the Finished [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectEnclosing_Circle_Algorithm Enclosing Circle Algorithm]applied to the VC study. Added descriptive statistics to cohort data excel fileEnclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.
2017-02-0814: Worked on Neural Net the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier ProjectEnclosing_Circle_Algorithm Enclosing Circle Algorithm].
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].
2017-02-1408: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmIndustry_Classifier Industry Classifier Project].
2017-02-1507: Finished Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project] applied . Added descriptive statistics to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixescohort data excel file.
2017-02-1602: Reworked Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithminteractive_maps Interactive Maps] to create a file of geocoded data. Began work on wrapping the algorithm in C No helpful additions to improve speedmap embedding problem.
2017-02-2001: Continued Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download geocoded ), student pop, faculty pop, etc.Circle project for VC data will end up being a joint project to join accelerator data . Pull descriptions for VC Data as part . Founders of the [http://mcnairaccelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot).bakerinstituteCan eventually get academic backgrounds through linkedin.orgPull business registration data, Stern/wiki/Enclosing_Circle_Algorithm Enclosing Circle Guzman Algorithm] Project. Assisted work GIS ontop of geocoded data.Maps that works on the [http://mcnair.bakerinstitutewiki or blog (CartoDB), Maps API and R.org/wiki/Industry_Classifier Industry NLP Projects, Description Classifier].
2017-0201-2131: Continued to download geocoded data for VC Data as part of the [http://mcnairBuilt WayBack Machine Crawler.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers Updated documentation for Python so that the Enclosing Circle Algorithm could be wrapped in Ccoordinates script. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here]Updated profile page to include locations of code.
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].
2017-0201-2330: Continued Optimized enclosing circle algorithm through memoization. Developed script to download geocoded read addresses from accelerator data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Pythonreturn latitude and longitude coordinates.
2017-0201-2726: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to download geocoded data get web urls for VC Data as part of the [http://mcnaircohorts, priority 3, make internet archive wayback machine driver.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the Located [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry ClassifierWhois_Parser Whois Parser].
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Found bug in Enclosing Circle Algorithm.
2017-0301-0125: Created statistics Finished parser for cohort data of the VC Circles Project[http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.
2017-0301-0224: Cleaned up Worked on parser for cohort data for of the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project] Project. Began Cohort data file created, debugging is almost complete. Will begin work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler]google accelerator search soon.
2017-0301-0623: Ran script to determine Worked on parser for cohort data of the top 50 cities which Enclosing Circle should be run [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on. Fixed the VC Circles script to take in a new data formatdebugging.
2017-0301-0719: Redetermined Downloaded pdfs in the top 50 cities which Enclosing Circle should be run on. Data on background for the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for VC Backed Companies can be found here.] Ran the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project] , completed creation of final data set(yay!). Began working on the Top 50 Citiescohort parser.
2017-0301-0818: Continued running Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmMoroccan_Parliament_Web_Crawler Moroccan Government Crawler Project] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.
2017-0301-0913: Continued running making text files for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project] on . Downloaded pdfs in the background for the Top 50 Cities[http://mcnair.bakerinstitute. Finished script to draw Enclosing Circles on a Google Maporg/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-0301-2012: Worked on debugging Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-0301-2111: Coded a brute force algorithm Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-0301-2310: Finished debugging Continued making text files for the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator Seed List project]. Implemented a method to plot Downloaded pdfs in the points and circles on a graphbackground for the [http://mcnair. Analyzed runtime of the brute force algorithmbakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above).
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.
2017-04-10: Began writing functioning crawler of LinkedIn.
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person.
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.
===Fall 2016===

Navigation menu