Changes

Jump to navigation Jump to search
no edit summary
09/27/2016 15:00-18:00: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.===Fall 2017===<onlyinclude>
09/29/2016 15:00-18:00: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]]
09/30/2016 2017-12:00-1421:00: Selenium program selects view pdf option from the website, and goes Last minute adjustments to the pdf webpageMoroccan Data. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may helpContinued working on [[Selenium Documentation]].
10/3/2016 13:00 2017-12- 1620:00: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sitesWorking on Selenium Documentation. Begun process of devising a naming system for the Wrote 2 demo files that does not require scraping. Tinkered with naming through regular expression parsing of the URLWiki Page is avaiable [http://www. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sitesedegan. Fixed bug on McNair com/wiki /Selenium_Documentation here]. Created 3 spreadsheets for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setupdata.
10/6/2016 13:30 2017-12- 18:0019: Discussed with Dr. Elbadawy about Finished fixing the desired file names for Moroccan data downloadDemo Day Crawler. The consensus was that the bill programs are ready to launch once the Changed files can be named properly, and installed as appropriate to make linked in crawler compatible with the questions data must be retrieved using a web crawler which I need to learn how to implementRDP. The naming Removed some of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loopbells and whistles.
10/7/2016 2017-12:00 - 1418:00: Learned unicode and utf8 encoding and decoding Continued finding errors with the Demo Day Crawler analysis. Rewrote the parser to remove any search terms that were in arabicthe top 10000 most common English words according to Google. Still working on transforming an ascii url into printable unicodeFinished uploading and submitting Moroccan data.
10/11/2016 2017-12-15:00 - 18:00: Fixed arabic bug, files can now be saved Found errors with arabic titlesthe Demo Day Crawler. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for Fixed scripts to download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scrapingMoroccan Law Data.
10/13/2016 13:002017-12-1814:00: Completed Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach Begin writing Selenium documentation. Continuing to download the Moroccan oral and written questions TIGER data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
10/14/2016 2017-12:00-1406:00: Finished Oral Questions crawler. Finished Running Morocco Parliament Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any wayscript. Updated the Moroccan Web Driver/Web Analyzing Demo Day Crawler wiki pageresults. [http://mcnair.bakerinstituteContinued downloading for TIGER geocoder.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
10/18/2016 15:002017-11-1828:30: Finished code for Oral Questions web driver and Written Questions web driver using seleniumDebugging Morocco Parliament Crawler. Now, the data Running Demo Day Crawler for the dates of questions can be found using the crawler, all accelerators and the pdfs of the questions will be downloaded using selenium. [http://mcnair10 pages per accelerator.bakerinstituteTIGER geocoder is back to Forbidden Error.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
10/20/2016 13:002017-11-18:0027: Continued to download data for the Moroccan Rerunning Morocco Parliament Written and Oral QuestionsCrawler. Updated Wiki pageFixed KeyTerms. Started working on Twitter project with Christypy and running it again. [http://mcnair.bakerinstituteContinued downloading for TIGER geocoder.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
10/21/2016 12:002017-11-14:0020: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter CrawlerDemo_Day_Page_Parser Demo Day Page Parser] to see how I can be helpful. DrFixed KeyTerms. Egan asked me to think about how to potentially make multiple tools py and trying to get cohorts and other sorts of data from accelerator sitesrun it again. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at Forbidden Error continues with the [http://mcnairTIGER Geocoder.bakerinstituteBegan Image download for Image Classification on cohort pages.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Clarifying specs for potential ideas on how to bring this project to fruitionMorocco Parliament crawler.
2017-11/1/2016: 15:00-18:0016: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. running [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web CrawlerDemo_Day_Page_Parser Demo Day Page Parser] . Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Kuwait Web Crawler/DriverImage download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.
2017-11/3/2016: 13:00-1815:00Continued running [http: Continued //www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to download Moroccan data in the backgroundextract counts that were greater than 2 from Keyword Matcher. DrContinued downloading for [http://www. Egan fixed systems requirements to run the GovTrack Web Crawleredegan. Made significant progress on the Kuwait Web Crawlercom/wiki/Driver for the Middle East Studies DepartmentTiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.
2017-11/4/2016: 12:00-14:00Continued running [http: Continued ///www.edegan.com/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to download Moroccan data in the backgroundText parser. Finished writing initial Kuwait Web Crawler/Driver See Parser Demo Day Page for the Middle East Studies Departmentfile location. Middle East Studies Department asked Continued downloading for additional embedded files in the Kuwait website. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverTiger_Geocoder TIGER Geocoder].
2017-11/8/2016: 15:00-1813:00: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. Built [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web DriverDemo_Day_Page_Parser Demo Day Page Parser].
2017-11/10/2016 13:00-1809:00: Continued to download Moroccan data and Kuwait data in the background. Began work on [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Running demo version of Demo Day crawler (Accelerator Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPSFixing work log format.
2017-11/11/2016 12:00-207:00: Continued to download Moroccan data in Created file with 0s and 1s detailing whether crunchbase has the backgroundfounder information for an accelerator. Attempted to find bug fixes for the Details posted as a TODO on [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator ProjectSeed List] crawlerpage. Still waiting for feedback on the PostGIS installation from [http://www.edegan.com/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.
2017-11/15/2016 15:00-1806:00Contacted Geography Center for the US Census Bureau, [https: Finished download of Moroccan Written Question pdfs//www.census.gov/geo/about/contact. Wrote a parser with Christy to be used for parsing bills from Congress html here], and eventually executive ordersbegan email exchange on PostGIS installation problems. Found bug in Began working on the system Python [http://www.edegan.com/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that was worked out will be used with Yang and rebootedML to find Demo Days for cohort companies.
2017-11/17/2016 13:00-1801:00: Wrote a crawler Attempted to retrieve information about executive orderscontinue downloading, and their corresponding pdfshowever ran into HTTP Forbidden errors. They can be found Listed the errors on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report here.Tiger_Geocoder Tiger Geocoder Page] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy.
11/18/2016 12:002017-10-231:00: Converted Executive Order PDFs to text files using adobe acrobat DC. See Began downloading blocks of data for individual states for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report WikipageTiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for detailsinstallation, and beginning to write documentation on usage.
11/22/2016 15:002017-10-1830:00: Transferred downloaded Morocco Written Bills With Ed's help, was able to provided SeaGate Driveget the national data from Tiger installed onto a database server. Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) hereDatabase_Server_Documentation#Editing_Users the database server documentation]under "Editing Users".
11/29/2016 15:002017-10-1825:00: Began pulling data from Continued working on the accelerators listed [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) herePostGIS_Installation TigerCoder Installation]. Made text files for about 18 accelerators.
12/1/2016 13:002017-10-1824:00: Continued making text files for the [http://mcnairThrow some addresses into a database, use address normalizer and geocoder.bakerinstituteMay need to install things.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for Details on the installation process can be found on the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/E%26I_Governance_Policy_Report E&I Governance Report ProjectPostGIS_Installation PostGIS Installation page] with Christy. Adds a column of data that shows whether or not the bill has been passed.
12/2/2016 12:002017-10-14:0023: Built and ran web Finished Yelp crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectHouston_Innovation_District Houston Innovation District Project].
12/6/2016 15:002017-10-18:00: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http19://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], Continued work on Yelp crawler for [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) F6S Crawler and ParserHouston_Innovation_District Houston Innovation District Project].
12/7/2016 15:002017-10-18:00: Continued making text files work on Yelp crawler for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectHouston_Innovation_DistrictHouston Innovation District Project].
12/8/2016 14:002017-10-1817:00: Continued making text files Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the [http://mcnairstate of California.bakerinstituteFinished maps of Route 128.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]Began working on selenium Yelp crawler to get cafe locations within the 610-loop.
1/2017-10/2017 14:30-17:1516: Continued making text files for Assisted Harrison on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List USITC project]. Downloaded pdfs in the background Looked for the [http://mcnair.bakerinstitutenatural language processing tools to extract complaintants and defendants along with their location from case files.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.
1/11/2017 -10:00-12:00: Continued making text files for the [http13://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List Updated various project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]pages.
1/12/2017 14:30-17:4510-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List work on Patent Thicket project, awaiting further project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]specs.
1/17/2017 14:30-17:1510-05: Continued making text files Emergency ArcGIS creation for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List Agglomeration project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].
1/18/2017 -10:00-12:0004: Downloaded pdfs in the background Emergency ArcGIS creation for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]Agglomeration project.
1/19/2017 14:30- 1710-02:45: Downloaded pdfs in the background for the [http://mcnairWorked on ArcGIS data.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser See Harrison's Work Log for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parserdetails.
1/23/2017 10:00-12:00: Worked on parser for cohort data of the [http09-28://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debuggingAdded collaborative editing feature to PyCharm.
1/24/2017 14:30-17:1509-27: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data big database file created, debugging is almost complete. Will begin work on the google accelerator search soon.
1/2017-09-25/2017 10:00New task -12:00: Finished parser for cohort data of the - Create text file with company, description, and company type.#[http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List projectVC_Database_Rebuild VC Database Rebuild]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.#psql vcdb2#table name, sdccompanybasecore2#Combine with Crunchbasebulk
1/26/2017 14#TODO:30-17:45: Continued working Write wiki on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2linkedin crawler, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/write wiki/Whois_Parser Whois Parser]on creating accounts.
1/30/2017 10:00-12:0009-21: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinatesWrote wiki on Linkedin crawler, met with Laura about patents project.
1/31/2017 14:30-1709-20:15: Built WayBack Machine CrawlerFinished running linkedin crawler. Updated documentation for coordinates scriptTransferred data to RDP. Updated profile page to include locations of codeWill write wikis next.
2/1/2017 10:00-12:0009-19:Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn. 2017-09-14: Continued implementing LinkedIn Crawler for profiles. 2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles. 2017-09-12: Continued working on the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic. 2017-09-11: Continued working on the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data].  2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://www.edegan.com/wiki/Crunchbase_Data here]. 2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://www.edegan.com/wiki/Crunchbase_Data here]. 2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs. 2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it. 2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) here] under Section 4. 2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.</onlyinclude> ===Spring 2017=== 2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server. 2017-04-20: Finished the HTML Parser for the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data. 2017-04-19: Made updates to the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler. 2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data. 2017-04-17: Worked on ways to get correct search results from the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler. 2017-04-13: Worked on debugging the logout procedure for the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://www.edegan.com/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot]. 2017-04-12: Work on bugs with the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. 2017-04-11: Completed functional [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person.  2017-04-10: Began writing functioning crawler of LinkedIn.  2017-04-06: Continued working on debugging and documenting the [http://www.edegan.com/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search. 2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment. 2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier. 2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm. 2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above).  2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles. 2017-03-23: Finished debugging the brute force algorithm for [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm. 2017-03-21: Coded a brute force algorithm for the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. 2017-03-20: Worked on debugging the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. 2017-03-09: Continued running [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map. 2017-03-08: Continued running [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps. 2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://www.edegan.com/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. 2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format. 2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://www.edegan.com/wiki/LinkedInCrawlerPython LinkedIn Crawler]. 2017-03-01: Created statistics for the VC Circles Project. 2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Found bug in Enclosing Circle Algorithm. 2017-02-27: Continued to download geocoded data for VC Data as part of the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier]. 2017-02-23: Continued to download geocoded data for VC Data as part of the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python. 2017-02-22: Continued to download geocoded data for VC Data as part of the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier Project]. 2017-02-21: Continued to download geocoded data for VC Data as part of the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here]. 2017-02-20: Continued to download geocoded data for VC Data as part of the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier]. 2017-02-16: Reworked [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed. 2017-02-15: Finished [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes. 2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://www.edegan.com/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. 2017-02-13: Worked on Neural Net for the [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier Project]. 2017-02-08: Worked on Neural Net for the [http://www.edegan.com/wiki/Industry_Classifier Industry Classifier Project]. 2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file. 2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://www.edegan.com/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem. 2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.
Circle project for VC data will end up being a joint project to join accelerator data.
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin.
NLP Projects, Description Classifier.
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.  2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates. 2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3, make internet archive wayback machine driver. Located [http://www.edegan.com/2wiki/Whois_Parser Whois Parser].  2017 14-01-25:30Finished parser for cohort data of the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project. 2017-01-1524:45Worked on parser for cohort data of the [http: Out sick//www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, independent research and debugging is almost complete. Will begin work from RDPon the google accelerator search soon. 2017-01-23: Worked on parser for cohort data of the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging. Brief research into  2017-01-19: Downloaded pdfs in the background for the [http://jorgegwww.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].scriptsCreated parser for the [http://www.mitedegan.educom/homepagewiki/wpAccelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser. 2017-01-18: Downloaded pdfs in the background for the [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. 2017-content01-13: Continued making text files for the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http:/uploads/2016www.edegan.com/03wiki/GuzmanMoroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. 2017-Stern01-State12: Continued making text files for the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. 2017-of01-American11: Continued making text files for the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. 2017-Entrepreneurship01-FINAL10: Continued making text files for the [http://www.edegan.pdf Sterncom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].  ===Fall 2016=== 2016-12-Guzman algorithm08: Continued making text files for the [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Research into  2016-12-07: Continued making text files for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/interactive_maps Interactive MapsAccelerator_Seed_List_(Data) Accelerator Seed List project]. No helpful additions  2016-12-06: Learned how to map embedding problemuse git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://www.edegan.com/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://www.edegan.com/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].
2/7/2017 14:302016-12-1702:15: Fixed bugs in parse_cohort_dataBuilt and ran web crawler for Center for Middle East Studies on Kuwait.py, the script Continued making text files for parsing the cohort data from the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.
2/8/2017 10:002016-12-01: Continued making text files for the [http:00 Worked on Neural Net //www.edegan.com/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier E%26I_Governance_Policy_Report E&I Governance Report Project]with Christy. Adds a column of data that shows whether or not the bill has been passed.
2/13/2017 10:002016-11-1229:00 Worked on Neural Net for Began pulling data from the accelerators listed [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier ProjectAccelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.
2/14/2017 14:302016-11-17:1522: Worked on the application of the Enclosing Circle algorithm Transferred downloaded Morocco Written Bills to the VC studyprovided SeaGate Drive. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the Made a "gentle" F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) here].
2/15/2017: 10:002016-11-1218:00: Finished Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmE%26I_Governance_Policy_Report Wikipage] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixesfor details.
2/16/2017 14:302016-11-17:45: Reworked Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmE%26I_Governance_Policy_Report here.] Next step is to run code to create a file of geocoded data. Began work on wrapping convert the algorithm in C pdfs to improve speedtext files, then use the parser fixed by Christy.
2/20/2017 10:002016-11-12:0015: Continued to Finished download geocoded data for VC Data as part of the [http://mcnairMoroccan Written Question pdfs.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] ProjectWrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Assisted work on Found bug in the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier]system Python that was worked out and rebooted.
2/21/2017 14:302016-11- 17:1511: Continued to download geocoded Moroccan data for VC Data as part of in the [http://mcnairbackground.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers Attempted to find bug fixes for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [httpshttp://www.microsoftedegan.com/en-uswiki/download/details.aspx?id=44266 hereAccelerator_Seed_List_(Data) Accelerator Project]crawler.
2/22/2017 2016-11-10:00-12:00: Continued to download geocoded Moroccan data for VC Data as part of and Kuwait data in the background. Began work on [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmGoogle_Scholar_Crawler Google Scholar Crawler] Project. Helped out with Wrote a crawler for the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry Classifier Accelerator_Seed_List_(Data) Accelerator Project]to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS.
2/23/2017 14:302016-11-17:4508: Continued to download geocoded Moroccan data in the background. Finished writing code for VC Data as part of the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmMoroccan_Parliament_Web_Crawler Moroccan Web Driver] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.
2/27/2017 10:002016-11-12:0004: Continued to download geocoded Moroccan data for VC Data as part of in the [http://mcnairbackground.bakerinstitute.orgFinished writing initial Kuwait Web Crawler/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] ProjectDriver for the Middle East Studies Department. Assisted work on Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Industry_Classifier Industry ClassifierMoroccan_Parliament_Web_Crawler Moroccan Web Driver].
2/28/2017 14:302016-11-1703:15: Finished downloading geocoded Continued to download Moroccan data for VC Data as part of in the [http://mcnairbackground. Dr.bakerinstituteEgan fixed systems requirements to run the GovTrack Web Crawler.orgMade significant progress on the Kuwait Web Crawler/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Found bug in Enclosing Circle AlgorithmDriver for the Middle East Studies Department.
3/1/2017 10:002016-11-1201:00Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http: Created statistics for the VC Circles Project//www.edegan.com/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.
3/2/2017 14:302016-10-1721:45: Cleaned up Continued to download data for the VC Circles ProjectMoroccan Parliament Written and Oral Questions. Looked over [http://www. Created histogram edegan.com/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data in Excelfrom accelerator sites. See [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmAccelerator_Seed_List_(Data) Accelerator List] Project. Began work on He also asked me to look at the [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/LinkedInCrawlerPython LinkedIn Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler]for potential ideas on how to bring this project to fruition.
3/6/2017 2016-10:00-1220:00: Ran script Continued to determine download data for the top 50 cities which Enclosing Circle should be run Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working onTwitter project with Christy. Fixed the VC Circles script to take in a new data format[http://www.edegan.com/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]
3/7/2017 14:302016-10-1718:15: Redetermined the top 50 cities which Enclosing Circle should be run onFinished code for Oral Questions web driver and Written Questions web driver using selenium. Data on Now, the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities data for VC Backed Companies the dates of questions can be found hereusing the crawler, and the pdfs of the questions will be downloaded using selenium.] Ran [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmMoroccan_Parliament_Web_Crawler Moroccan Web Driver] on the Top 50 Cities.
3/8/2017 2016-10:00-1214:00: Continued running Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmMoroccan_Parliament_Web_Crawler Moroccan Web Driver] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.
3/9/2017 14:302016-10-1713:45: Continued running Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnairwww.bakerinstituteedegan.orgcom/wiki/Enclosing_Circle_Algorithm Enclosing Circle AlgorithmMoroccan_Parliament_Web_Crawler Moroccan Web Driver] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.
3/20/2017 2016-10:00-1211:00: Worked on debugging the [http://mcnairFixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download.bakerinstituteStarted learning scrapy library in python for web scraping.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]Discussed idea of screenshot-ing questions instead of scraping.
3/21/2017 14:302016-10-1707:15: Coded a brute force algorithm for the [http://mcnair.bakerinstituteLearned unicode and utf8 encoding and decoding in arabic.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]Still working on transforming an ascii url into printable unicode.
3/23/2017 14:302016-10- 1706:45: Finished debugging Discussed with Dr. Elbadawy about the brute force algorithm desired file names for [http://mcnairMoroccan data download.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method The consensus was that the bill programs are ready to plot launch once the points files can be named properly, and circles on the questions data must be retrieved using a graphweb crawler which I need to learn how to implement. Analyzed runtime The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the brute force algorithmMcNair blog site on an infinite loop.
3/27/2017 2016-10:00-1203:00: Worked on debugging Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Enclosing Circle AlgorithmRatified bills sites. Implemented Begun process of devising a way naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to remove interior circlesthe differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and determined that translation it appears to latitude and longitude coordinates resulted in slightly off center circlesbe very different from the Moroccan setup.
3/28/2017 14:2016-09-30- 17:15: Finished running Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the Enclosing Circle Algorithmnew page. Worked on removing incorrect points from CTRL S is sent to the data set(see above)page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.
3/2016-09-29/2017 10:00Re-12:00: Worked enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on debugging points Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the Enclosing Circle Algorithmproblem.
4/3/2017 10:002016-09-12:0026: Finished debugging points Set up Staff wiki page, work log page; registered for the Enclosing Circle AlgorithmSlack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Added Command Line functionality to the Industry ClassifierCreated wiki page for Moroccan Web Driver Project.
4/5/2017 9:45-11:45: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.'''Notes'''
4/6/2017 14*Ed moved the Morocco Data to E:00-17\McNair\Projects from C:15: Continued working on debugging and documenting the [http\Users\PeterJ\Documents* C Drive files moved to E://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.\McNair\Users\PeterJ
4/10/2017 10:00-12:00: Began writing functioning crawler of LinkedIn.
[[Category:Work Log]]

Navigation menu