Changes

Jump to navigation Jump to search
5,119 bytes added ,  18:40, 27 February 2018
no edit summary
09===Spring 2018===<onlyinclude> [[Shrey Agarwal]] [[Work Logs]] [[Shrey Agarwal (Work Log)|(log page)]] 1/2723/2016 1418 15:00 - 17:00: *Set up personal and work log pagesBecame reacclimatized with the project, accessed Remote Desktop. spoke with Ed about the direction for the rest of the semester1/25/18 15:00 - 17:00*Compiled list of accelerators from WikiBegan examining the data on pulled webpages relating to demo days091/2926/2016 1418 13:00 - 1617:00*Began categorizing demo day pages based on:1) relevance to accelerators, 2) relevance to the particular accelerator (got to 200)1/30/18 15; 16:45 00 - 17:30:00*Created new project: [[Accelerator Seed List Continued working through the demo day pages, spoke with Ed about using the data to work a better set (Datagot to 450)]] 2/01/18 15:00 - 17:00*Finished the match and worked with Dr. Egan created pivot tables to create schematic for data entry.count the number of repetitions (companies going through more than one accelerator)2/06/18 15:00 - 17:00*Evaluated 3 sources and logged Discussed with Matthew the best way to collect the VC data. Sources were taken from [[List of Accelerators]]. Logged each step onto project page and identified categories that would be suitable for web crawling sometime in the futurerepetitions.We tried different matches through our SDC data to no avail102/1108/2016 1418 15:00 - 1718:30;00*Explored how Continued attempting to use regular expressions in TextPad to aid match with SDC the different columns. Didn't work without separating the data sorting (need to review expressions with Drinto individual files, a very tedious process. Egan in future)2/13/18 15:00 - 17:00*Continued evaluating sources from [[List of Accelerators]] and recorded steps onto Spoke with Ed about incubators project page, will begin as soon as beforewe can time the accelerator startup investments. Finished evaluating Ed is expecting us to begin sometime in the six sources from initial listnext two months, using a similar process as we did for incubators. The process should be handled by a new worker. (All work done in [[Accelerator Seed List (Data)]])102/1315/2016 1418 15:00 - 17:00;*All work done in [[Accelerator Seed List (Data)]]*Talked to Dr. Egan Ed about next steps for the project going forward. Need to pick out 10-15 accelerators from Practiced accessing the sources listed CrunchBase database on my project page SQL and identify a reliable method brushed up on SQL code.2/16/18 13:00 - 17:00*Sifted through the database for obtaining cohort Crunchbase investment information, as well as other variables.2/20/18 15:00 - 17:00*Used google searches to identify more sources, Pulled the funding rounds table from SQL and evaluated three databases matched it with the help of TextPadcompanies that have received VC funding in order to gather round dates.2/22/18 15:00 - 18:00*Began working on more generic google searchesWent through the matched data. Was able Brainstormed ways to go get the dates for cohort companies going through "Location+accelerator"accelerators.2/27/18 15:00 -type searches today. Will continue next time17:00*Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.10</onlyinclude> ===Fall 2017=== <onlyinclude> 9/1819/2016 1417 15:00 - 17:30;00*Work continued in [[Accelerator Seed List (Data)]]Became reacclimatized with the project, spoke with Ed about the direction for the rest of the semester9/20/17 15:00 - 17:00*Took Worked on setting up a sample size of 10 accelerators new pull for the updated SDC data9/21/17 15:00 - 17:00*Finished the pull and detailed how to extract cohort information, as well as what other information is readily available sorted the data from the updated accelerator URLs.list9/22/17 15:00 - 17:00*Brought Tried to set up the matcher with Matthew up to speed ; ran into some difficulties on accelerator projectPower Shell, added summaries to each section so they became easier to follow, and worked with him to finish up extracting cohort informationreturning a blank file in the output109/2026/16 1417 15:30 00 - 17:30:00*Work continued in [[Accelerator Seed List Finished the match and created pivot tables to count the number of repetitions (Datacompanies going through more than one accelerator)]]9/27/17 15:00 - 17:00*Finished up Discussed with Matthew the best way to collect the list of instructions for finding VC data from the cohortrepetitions. We tried different matches through our SDC data to no avail9/28/17 16:00 - 17:00*Continued compiling attempting to match with SDC the list of variables for each of different columns. Didn't work without separating the accelerators within the sample sizedata into individual files, a very tedious process.9/29/17 15:00 - 17:00*Consulted Peter on prospects of creating a web crawler Spoke with Ed about incubators project, will begin as soon as we can time the accelerator startup investments. Ed is expecting us to begin sometime in the information next two months, using a similar process as we currently have compileddid for incubators. Determined it was possible, although beyond the scope of Peter's knowledgeThe process should be handled by a new worker.10/2502/16 1417 15:00 - 17:00*Consulted Talked to Ed with about next step steps for the project.*Began listing the E-R diagram onto Practiced accessing the accelerator CrunchBase database page where entities were potential categories on SQL and each entity had its associated attributesbrushed up on SQL code.10/2703/16 1417 15:00 - 17:00*Continued working with Matthew to identify elements in Sifted through the E-R diagram database for pulling Crunchbase investment information on accelerators. 10/04/17 15:00 - 17:00*Found sources Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to obtaingather round dates.10/06/cross17 15:00 -reference information (ie17:00*Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators. Angel List)10/11/08/16 1417 15:00 - 1817:00*Identified possible keywords Looked into using the WhoIs Parser in order to filter results find when the companies went through for their accelerators.10/12/17 15:00 - 17:00*Began compiling Discovered that the Wayback Machine will not be a good option for identifying the time when a company went through the accelerator. Created a comprehensive list of accelerators based on VC Companies and their earliest round date. Included a column for the data we have already sifted date they went through.*Learned how to use regular expressions from Ben to sort names individually their accelerators and alphabeticallywill fill it in when we find a good method of finding this date.11/10/16 14/17 15:00 - 1817:00*Began Continued working on sorting through accelerator list and removing duplicates, as well as identifying more places to pull names fromVCCompanies by their earliest round date.10/17/17 15:00 - 17:00*Worked with Peter Ben to create find a crawl solution to our problem of data acquisition. Finalized earliest round date for f6s because the website does not return only acceleratorsVCCompanies.1110/18/17 15/16 14:00 - 1817:00*Took a break from f6s Updated our VC data with Ed's help in order to locate more lists based on individual google searches such as "city+accelerator+list"increase the accuracy and completion of our data.10/19/17 15:00 - 17:00*Put Seed DB information into an excel file on Organized all of our matched data and updated it in order to reflect the remote desktopmost recent SDC pull with Ed. Matched Crunchbase data with our cohort companies.1110/20/17/16 1415:00 - 1617:00*Continued filling out information for Generated the random Google Searchesnew list of VCCompanies as well as their earliest round dates.10/23/17 15:00 - 17:00*Organized TextPad files Worked on sorting out the RDP into coherent excel spreadsheets with proper headers on the table*Noticed problem with f6s: it seems although all of the html coding was protected by a captcha so the crawler did not actually extract any information; it was all blockeddiscrepancies in our matched data.1110/2224/16 1417 15:00 - 17:00*Worked Went through list of VCCompanies and began adding respective accelerators in order to fix f6s crawler proceed with PeterVCPercentage table.10/25/17 15:00 - 17:00*Finished and compiled master Continued going through list of VCCompanies and adding accelerators.1210/0126/16 1417 15:00 - 1817:00*Caught up Continued going through list of VCCompanies and adding accelerators. Will have this completed on project with Ed and CarlinMonday.10/30/17 15:00 - 17:00*Took 20 Finished adding all of the accelerators (241-260) from to the list and filled out textof VCCompanies. Added a column indicating whether or not the company went through two or more accelerators.html files 10/31/17 15:00 - 17:00*Began compiling data in the column for them; finished the 20dates that a specific company went through an Accelerator.1211/0501/16 1317 15:00 - 1617:00*After finishing first 20 accelerators, continued working down the list, beginning at 321Finalized entering dates for Y Combinator cohort companies.11/02/17 15:00 - 17:00*Work noted in [[Accelerator Seed List (Data)]], but mostly stored on McNair RDPContinued entering cohort company dates into Excel file.1211/06/16 1417 15:00 - 1817:00*Continued "Accelerating" down Began looking at keywords for identifying the list in [[Accelerator Seed List (Data)]], finished up until 340cohort class dates for each company1211/0807/16 1417 15:00 - 17:00*Continued working on Received list from Peter with the accelerator list founders matched from the Crunchbase LinkedIn URLs and proceeded to find the links for those founders without a match on Crunchbase. Data found in "Unfound Founders List" in the same page.Fall 2017 folder </onlyinclude> ===Spring 2017=== 
01/17/17 14:00 - 16:00
*Finished up "accelerating" from [[Accelerator Seed List (Data)]], numbers 341-351
4/11/17 14:00 - 16:00
*Finished compiling the accelerator and cohort information for the few we found from SARP, will consult Ed to figure out how to approach the missing accelerators and what to do for the preliminary report
9===Fall 2016=== 09/1927/2016 14:00 - 17 :00: *Set up personal and work log pages, accessed Remote Desktop. *Compiled list of accelerators from Wiki09/29/2016 14:00 - 16:15; 16:00 45 - 17:0030:*Became reacclimatized Created new project: [[Accelerator Seed List (Data)]] and worked with the Dr. Egan to create schematic for data entry.*Evaluated 3 sources and logged data. Sources were taken from [[List of Accelerators]]. Logged each step onto project, spoke with Ed about the direction page and identified categories that would be suitable for web crawling sometime in the rest future.10/11/2016 14:00 - 17:30;*Explored how to use regular expressions in TextPad to aid with data sorting (need to review expressions with Dr. Egan in future)*Continued evaluating sources from [[List of Accelerators]] and recorded steps onto project page, as before. Finished evaluating the semestersix sources from initial list. (All work done in [[Accelerator Seed List (Data)]])910/2013/17 152016 14:00 - 17:00;*Worked All work done in [[Accelerator Seed List (Data)]]*Talked to Dr. Egan about project going forward. Need to pick out 10-15 accelerators from the sources listed on setting up my project page and identify a new pull reliable method for obtaining cohort information, as well as other variables*Used google searches to identify more sources, and evaluated three databases with the updated SDC datahelp of TextPad*Began working on more generic google searches. Was able to go through "Location+accelerator"-type searches today. Will continue next time.910/2118/17 152016 14:00 - 17:0030;*Work continued in [[Accelerator Seed List (Data)]]*Finished the pull Took a sample size of 10 accelerators and sorted the data detailed how to extract cohort information, as well as what other information is readily available from the updated accelerator listURLs.*Brought Matthew up to speed on accelerator project, added summaries to each section so they became easier to follow, and worked with him to finish up extracting cohort information910/2220/17 1516 14:00 30 - 17:0030:*Work continued in [[Accelerator Seed List (Data)]]*Tried to set Finished up the matcher list of instructions for finding the cohort. Continued compiling the list of variables for each of the accelerators within the sample size.*Consulted Peter on prospects of creating a web crawler with Matthew; ran into some difficulties on Power Shellthe information we currently have compiled. Determined it was possible, returning a blank file in although beyond the outputscope of Peter's knowledge.910/2625/17 1516 14:00 - 17:00*Finished Consulted Ed with next step for project.*Began listing the match and created pivot tables to count E-R diagram onto the number of repetitions (companies going through more than one accelerator)database page where entities were potential categories and each entity had its associated attributes910/27/17 1516 14:00 - 17:00*Discussed Continued working with Matthew to identify elements in the best way E-R diagram for pulling information on accelerators. *Found sources to obtain/cross-reference information (ie. Angel List)11/08/16 14:00 - 18:00*Identified possible keywords to collect filter results through for accelerators*Began compiling a comprehensive list of accelerators based on the VC data we have already sifted through.*Learned how to use regular expressions from the repetitionsBen to sort names individually and alphabetically. We tried different matches 11/10/16 14:00 - 18:00*Began sorting through our SDC data accelerator list and removing duplicates, as well as identifying more places to pull names from.*Worked with Peter to no availcreate a crawl for f6s because the website does not return only accelerators.911/2815/16 14:00 - 18:00*Took a break from f6s to locate more lists based on individual google searches such as "city+accelerator+list"*Put Seed DB information into an excel file on the remote desktop11/17 /1614:00 - 1716:00*Continued attempting to match filling out information for the random Google Searches*Organized TextPad files on the RDP into coherent excel spreadsheets with SDC proper headers on the different columns. Didn't work without separating table*Noticed problem with f6s: it seems although all of the data into individual files, html coding was protected by a very tedious processcaptcha so the crawler did not actually extract any information; it was all blocked.911/2922/16 14:00 - 17 15:00*Worked to fix f6s crawler with Peter*Finished and compiled master list of accelerators12/01/16 14:00 - 1718:00*Spoke Caught up on project with Ed about incubators projectand Carlin*Took 20 accelerators (241-260) from the list and filled out text.html files for them; finished the 2012/05/16 13:00 - 16:00*After finishing first 20 accelerators, will begin as soon as we can time continued working down the accelerator startup investments. Ed is expecting us to begin sometime list, beginning at 321*Work noted in [[Accelerator Seed List (Data)]], but mostly stored on McNair RDP12/06/16 14:00 - 18:00*Continued "Accelerating" down the next two monthslist in [[Accelerator Seed List (Data)]], using a similar process as we did for incubators. The process should be handled by a new workerfinished up until 34012/08/16 14:00 - 17:00*Continued working on accelerator list on the same page
[[Category:Work Log]]
216

edits

Navigation menu