Changes

Jump to navigation Jump to search
no edit summary
10/17/2016 2:00-5:00 pm: Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it is and included some helpful links===Fall 2017===<onlyinclude>
10[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]] 9/1811/2016 42017 2:00-65:00 pm*Spoke to Ed about the project going forward. Organized the current updated data for our project. 9/12/2017 3: 00-5:00 pm*Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week. 9/13/2017 2:00-5:00 pm*Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table. 9/14/2017 3:00-5:00 pm*Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research partner Shrey who filled for credit course. 9/18/2017 2:00-4:00 pm*Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me . 9/19/2017 3:00-5:00 pm*Completed SDC pull of updated VC Data. 9/20/2017 2:00-5:00 pm*Attempted several times to run the Matcher. Cleaned our pulled data. 9/21/2017 3:00-5:00 pm*Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter. 9/25/2017 2:00-5:00 pm*Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia accelerators. 9/26/2017 3:00-5:00 pm*Worked on finding the duplicates in our Matched file in order to have the most accurate data. 9/27/2017 2:00-5:00 pm*Attempted to find a way to organize the duplicate matches. 9/28/2017 4:00-5:00 pm*Continued running through matched data in order to organize it effectively. 10/2/2017 2:00-5:00 pm*Talked to Ed about next steps for the project. Practiced accessing the crunchbase database on where SQL. Brushed up on SQL code. 10/3/2017 3:00-5:00 pm*Searched the database for crunchbase investment information. 10/4/2017 2:00-5:00 pm*Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates. 10/6/2017 3:00-5:00 pm*Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators. 10/11/2017 2:00-3:30 pm:*Looked into using the WhoIs Parser in order to find when the companies went through their accelerators. 10/12/2017 3:00-5:00 pm*Discovered that the Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we are find a good method of finding this date. 10/16/2017 2:00-3:30 pm*Continued working on sorting VCCompanies by their earliest round date. 10/17/2017 3:00-5:00 pm*Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies. 10/18/2017 2:00-5:00 pm*Updated our VC data with Ed's help in order to increase the accuracy and completion of our data. 10/19/2017 3:00-5:00 pm*Organized all of our matched data and updated it in order to reflect the project; Began looking most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies. 10/20/2017 2:00-3:30 pm*Generated the new list of VCCompanies as well as their earliest round dates. 10/23/2017 2:00-3:30 pm*Worked on sorting out the discrepancies in our matched data. 10/24/2017 3:00-5:00 pm*Went through list of VCCompanies and began adding respective accelerators in order to proceed with VCPercentage table. 10/25/2017 2:00-5:00 pm*Continued going through list of VCCompanies and adding accelerators. 10/26/2017 3:30-5:30 pm*Continued going through list of VCCompanies and adding accelerators. Will have this completed on websites Monday. 10/30/2017 2:00-3:30 pm*Finished adding all of certain the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators. 10/31/2017 3:00-5:00 pm*Began compiling data in the column for Date Company went through Accelerator. 11/1/2017 2:00-4:00 pm*Finalized entering dates for Y Combinator cohort companies. 11/2/2017 4:00-5:30 pm*Continued entering cohort company dates into Excel file. 11/6/2017 2:00-4:00 pm*Continued entering cohort company dates into Excel file. Began compiling a list of keywords for how demo day press releases. 11/7/2017 3:00-5:00 pm*Finished coming up with keywords for demo day crawler. Sent the final list to determine their cohorts Peter. 11/8/2017 2:00-3:30 pm*Spoke to Ed and organized all of our current data. 11/9/2017 3:00-5:00 pm*Created a new project page called Accelerator Data and listed these steps all relevant files as well as descriptions. 11/14/2017 3:00-5:00 pm*Looked up URLs and decided whether or not the webiste was relevant. 11/15/2017 2:00-5:00 pm*Created SQL database entitled "acceleratordata" and began creating tables from folder of All Relevant Files. 11/16/2017 3:00-5:00 pm*Continued to input tables into SQL database. 11/20/2017 2:00-5:00 pm*Cleaned text files in order to import tables into SQL database. 11/27/2017 2:00-5:00 pm*Worked with Peter to find and exclude irrelevant keywords on HTML pages. Began categorizing relevant demo day pages. 11/28/2017 3:00-5:00 pm*Finished inputting tables of relevant files into SQL database. 11/29/2017 2:00-5:00 pm*Went through accelerator HTML URLs. Spoke with Ed about going through HTMLs and classifying based on overall and specific relevance. 12/1/2017 3:00-5:00 pm*Worked through accelerator links and classified pages based on whether or not they provided relevant information about startup timing. 12/4/2017 10:00-12:00 pm*Continued running through demo day crawl URLs and scoring them based on relevance. 12/7/2017 1:00-4:30 pm*Finalized scoring of demo day URLs for the wikioriginal crawl. Last day of work for this semester.
10</19/2016 2:00-5:00 pm: Finished looking on the remaining accelerator websites and wrote the steps on determining how to manually locate the cohorts.onlyinclude>
10/20/2016 4:00-6:00 pm: Met with Peter and Christy to discuss the possibility of creating a web crawler that will pull data from individual accelerator sites.===Spring 2017===
101/2418/2016 22017 1:00-5:00 pm: Brainstormed with Albert and Julia about changes to the category name *Continued collecting data for SBDE. Spoke to Ed about full scope of accelerator project. Helped Catherine draft tweets for the McNair Center twitter account.
101/2520/2016 42017 1:00-63:00 pm: Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables to search for in terms of *Continued collecting data on accelerators, startups, cohorts, etc. Attended McNair Center team meeting.
101/2623/2016 22017 1:00-5:00 pm: Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; *Began searching for characteristics that identify combing through accelerator list, determining which accelerators on their websitesare still missing data and documenting these in a TextPad file. Finished through #115.
101/2725/2016 42017 1:00-65:00 pm: *Continued searching for relevant lists of accelerators to include on our page. Added some links that have high potential under the tab (Obtained from List of Accelerators or various Google searches)looking through accelerator list.
101/3127/2016 22017 1:00-53:00 pm: Began constructing a *Continued going through accelerator list of variables that clearly distinguish an accelerator . Left off on its website. This is in an effort to allow a crawler to crawl through many Google searches and identify accelerators#226 with Shrey.
111/20/2017 1/2016 4:00-65:00 pm: *Continued looking for variables that could identify accelerators from their websitesgoing through accelerator list. Searched Finished through numerous different websites of accelerators obtained from our current databases#440.
112/21/2016 22017 1:00-45:00 pm: Continued combing *Finished going through websites the list of numerous accelerators, well-known and other, in looking for incomplete files. Began completing the hopes of finding identifying variablesfiles that were not done.
112/3/2016 42017 1:00-63:00 pm: Finalized my list of variables that could be used to distinguish the websites of accelerators. Slightly re-arranged our list of *Continued working on completing accelerator databases in order of relevancefiles.
112/76/2016 22017 1:00-54:00 30 pm: Began compiling the list *Finished data set of all accelerators. Created a new TextPad document with Began going through and making sure that all text files and cohort files are of the same format so Peter can easily pull the information . Left for 30 minutes for an interview from a new database2:30-3:00 pm.
112/8/2016 42017 1:00-65:00 pm: Worked *Finished formatting through #137. Spoke with Shrey and Ben in order to compile all of our accelerator databases into one long list on TextpadEd about project.
112/913/2016 22017 1:00-5:00 pm: Continued formulating a database *Completed formatting for all accelerators and all of the available info givenaccelerator text files.
112/1015/2016 42017 3:00-65:00 pm: Worked with Shrey and Peter in order *Made copy of the completed data set. Spoke to Ed about future steps to develop a crawler take for f6sproject including gathering founder data and obtaining the crunchbase api.
112/1417/2016 22017 1:00-53:00 pm: Began sorting *Went through final Excel spreadsheet for cohort information. Still need to run the crawler one more time after the completion of the editing process. Found the application for the Seed-DB database in an Excel documentcrunchbase api which will hopefully allow us to gain access.
112/1520/2016 42017 1:00-65:00 pm: Conducted some Google searches in an attempt to find more accelerator databases. Began looking through Executive Orders searching *Filled out another application for Crunchbase research access; Found the first source for keywordsthe incubator project on angel.co, will hopefully work with Peter to make a crawler similar to f6s
112/1622/2016 22017 1:00-5:00 pm: Completed searching through Executive Orders*Pulled data from SDC for Ed and normalized it. Learned how to use SDC and the normalizer.
112/1724/2016 42017 1:00-63:00 pm: Continued working *Finished cleaning up the cohort data for Y-combinator on Google searches for state accelerator list. Looked through f6s for common words that can be used to distinguish accelerators once we have finalized the crawlerFinal Cohort Excel Spreadsheet.
112/2127/2016 22017 1:00-5:00 pm: Randomly chose 10 accelerators from Excel list of accelerators on *Continued cleaning up the RDP. Went through each website and listed the steps that I took cohort data in order to determine whether or not the website belonged to an acceleratorExcel file. Will continue extracting cohort information tomorrowFinished Cohort Number and Year.
113/221/2016 42017 2:00-65:00 pm: Listed out all steps for extracting cohort information from the ten randomly chosen accelerators. *Worked with Peter in order Ben and Shrey to build a tool that will search pull data from SDC for all of the HTMLs VC funded companies and attempt normalized it to identify each one as put it in an accelerator as well as extract some basic informationExcel document.
113/283/2016 22017 1:00-52:00 30 pm: Merged the F6S accelerator list *Worked with our other list, then posted it on Ben to try and repeat down the project page. Learned process for accelerator VC data extraction from Edwithout it going too far.
113/296/2016 42017 1:00-64:00 pm: Began process of collecting *Worked with Shrey to finish cleaning the cohort data from . It is ready to be run through the 20 accelerators that I am responsible formatcher with Ben.
113/308/2016 22017 1:00-5:00 pm: Continued collecting data from accelerators. Finished 15/20*Matched the VC Data with the list of Cohort Companies and got one list of all cohort companies that have received VC funding.
123/110/2016 42017 12:00-62:00 pm: Continued collecting data from accelerators. Finished original 20, picked *Put a write-up a new set on the top of 20the Accelerator wiki page detailing where we are in the project currently as well as what data we have accumulated on the RDP.
123/220/2016 22017 1:00-5:00 pm: Continued collecting data from *Began gathering the URLs of all acceleratorsin a TextPad file called Accelerator URLs. Finished next 20Participated in the SQL training session.
123/822/2016 2017 1:00-35:00 pm: Completed collecting data from accelerators *Made tables in Terminal for Accelerator companies matched with VC companies and for the semesterCohort Data.
13/1827/2017 1:00-54:00 pm: Continued collecting data for *Compiled all URLs of accelerator project. Helped Catherine draft tweets for the McNair Center twitter accountinto a TextPad file.
13/2029/2017 1:00-35:00 pm: Continued collecting *Worked on the matched data on acceleratorswith Ben. Attended McNair Center team meetingNext time I will run the RegEx code that will filter the URLs, and I will look through the duplicates where two different VC backed company names matched to one cohort company name.
13/2331/2017 1:00-52:00 pm: Began combing through *Ran the code for accelerator list, determining urls which accelerators are still missing data and documenting these ready to be run through the wayback machine in a TextPad fileorder to get the start dates. Finished Also began looking through #115vc backed company names.
14/253/2017 1:00-5:00 pm: *Continued looking through accelerator listdouble matched VC companies. Learned more SQL from Ed.
14/275/2017 1:00-35:00 pm: Continued going through *Made the final vc percentage table on terminal and for next time I will collect missing accelerator list. Left off on #226 with Shreydata.
14/207/2017 1:00-53:00 pm: Continued going through accelerator *Began collecting cohort data for big accelerators that were missing from our list in order to add it to our final list. Finished through #440of cohort companies.
24/110/2017 1:00-5:00 pm: *Finished going through the list of gathering cohort company names for big accelerators looking for incomplete files. Began completing the files that we were not donemissing and put them into the Cleaned Cohort Companies Excel file. Ben is looking through Crunchbase data in order to possibly find more missing accelerators.
24/314/2017 1:00-34:00 pm: Continued *Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on completing accelerator filesthe ones that I was able to go through. Need to finish this textpad before moving forward.
24/617/2017 1:00-4:30 00 pm: Finished data set of accelerators. Began *Continued going through and making sure potential Crunchbase accelerators that all text files we may have missed. Talked to Ed about getting a more comprehensive list from Excel file and cohort files are by the end of the same format so Peter can easily pull semester have the information. Left for 30 minutes for an interview from 2:30-3:00 pmtables and data collected and done.
24/819/2017 1:00-54:00 pm: Finished formatting through #137*Worked with Jeemin to generate an entire list of potential US accelerators from crunchbase. Spoke with Ed about projectWorked to find a way to classify accelerators just based on their descriptions.
24/1321/2017 : 1:00-54:00 pm: Completed formatting *Continued working through the list identifying accelerators that we do not have. Ramee and Juliette are now helping us gather cohort data for all accelerator text filesthose missing accelerators.
24/1524/2017 39:00-51:00 pm: Made copy *Updated Veeral on current state of the completed data setproject. Spoke Typed up a to Ed about future steps to take -do list on the discussion wiki for project including gathering founder Veeral. Got new cohort data on an accelerator and obtaining the crunchbase apiadded it to Excel file.
25/173/2017 111:00-31:00 pm: Went *Talked to Ed and Anne about future report. Continued working through final Excel spreadsheet for cohort informationlist of crunchbase potential accelerators. Still need to run the crawler one more time after the completion Last day of the editing process. Found the application work for the crunchbase api which will hopefully allow us to gain accessthis semester.
2/20/2017 1:00-5:00 pm: Filled out another application for Crunchbase research access; Found the first source for the incubator project on angel.co, will hopefully work with Peter to make a crawler similar to f6s===Fall 2016===
210/2217/2017 12016 2:00-5:00 pm: Pulled data from SDC for Ed and normalized *Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of what I believe it. Learned how to use SDC is and the normalizer.included some helpful links
210/2418/2017 12016 4:00-36:00 pm: Finished cleaning up *Met with research partner Shrey who filled me in on where we are with the cohort data project; Began looking on websites of certain accelerators for Y-combinator how to determine their cohorts and listed these steps on the Final Cohort Excel Spreadsheet.wiki
210/2719/2017 12016 2:00-5:00 pm: Continued cleaning up *Finished looking on the remaining accelerator websites and wrote the cohort data in steps on determining how to manually locate the Excel file. Finished Cohort Number and Yearcohorts.
310/120/2017 22016 4:00-56:00 pm: Worked *Met with Ben Peter and Shrey Christy to discuss the possibility of creating a web crawler that will pull data from SDC for all VC funded companies and normalized it to put it in an Excel documentindividual accelerator sites.
310/324/2017 12016 2:00-25:30 00 pm: Worked *Brainstormed with Ben Albert and Julia about changes to try and repeat down the VC data without it going too farcategory name for SBDE. Spoke to Ed about full scope of accelerator project.
310/625/2017 12016 4:00-46:00 pm: Worked *Brainstormed with Shrey about different potential industry focuses within accelerators, as well as different variables to finish cleaning the cohort data. It is ready to be run through the matcher with Bensearch for in terms of accelerators, startups, cohorts, etc.
310/826/2017 12016 2:00-5:00 pm: Matched the VC Data with the list *Began searching for more databases including lists of Cohort Companies and got one list accelerators as well as some characteristics of all cohort companies those accelerators; Began searching for characteristics that have received VC funding.identify accelerators on their websites
310/1027/2017 122016 4:00-26:00 pm: Put a write-up *Continued searching for relevant lists of accelerators to include on the top of the Accelerator wiki our page detailing where we are in the project currently as well as what data we . Added some links that have accumulated on high potential under the RDPtab (Obtained from List of Accelerators or various Google searches).
310/2031/2017 12016 2:00-5:00 pm: *Began gathering the URLs constructing a list of all accelerators variables that clearly distinguish an accelerator on its website. This is in an effort to allow a TextPad file called Accelerator URLs. Participated in the SQL training sessioncrawler to crawl through many Google searches and identify accelerators.
311/221/2017 12016 4:00-56:00 pm: Made tables in Terminal for Accelerator companies matched with VC companies and *Continued looking for Cohort Datavariables that could identify accelerators from their websites. Searched through numerous different websites of accelerators obtained from our current databases.
311/272/2017 12016 2:00-4:00 pm: Compiled all URLs *Continued combing through websites of numerous accelerators, well-known and other, in the hopes of accelerator into a TextPad filefinding identifying variables.
11/3/29/2017 12016 4:00-56:00 pm: Worked on *Finalized my list of variables that could be used to distinguish the matched data with Benwebsites of accelerators. Next time I will run the RegEx code that will filter the URLs, and I will look through the duplicates where two different VC backed company names matched to one cohort company nameSlightly re-arranged our list of accelerator databases in order of relevance.
311/317/2017 12016 2:00-25:00 pm: Ran the code for accelerator urls which are ready to be run through the wayback machine in order to get *Began compiling the start dateslist of all accelerators. Also began looking through vc backed company namesCreated a new TextPad document with information from a new database.
411/38/2017 12016 4:00-56:00 pm: Continued looking through double matched VC companies. Learned more SQL from Ed*Worked with Shrey and Ben in order to compile all of our accelerator databases into one long list on Textpad.
411/59/2017 12016 2:00-5:00 pm: Made *Continued formulating a database for all accelerators and all of the final vc percentage table on terminal and for next time I will collect missing accelerator dataavailable info given.
411/710/2017 12016 4:00-36:00 pm: Began collecting cohort data for big accelerators that were missing from our list *Worked with Shrey and Peter in order to add it to our final list of cohort companiesdevelop a crawler for f6s.
411/1014/2017 12016 2:00-5:00 pm: Finished gathering cohort company names for big accelerators that we were missing and put them into *Began sorting the Cleaned Cohort Companies Seed-DB database in an Excel file. Ben is looking through Crunchbase data in order to possibly find more missing acceleratorsdocument.
411/1415/2017 12016 4:00-46:00 pm: *Conducted some Google searches in an attempt to find more accelerator databases. Began working looking through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and wrote notes on the ones that I was able to go through. Need to finish this textpad before moving forwardExecutive Orders searching for keywords.
411/1716/2017 12016 2:00-45:00 pm: Continued going *Completed searching through potential Crunchbase accelerators that we may have missed. Talked to Ed about getting a more comprehensive list from Excel file and by the end of the semester have the tables and data collected and doneExecutive Orders.
411/1917/2017 12016 4:00-46:00 pm: Worked with Jeemin to generate an entire *Continued working on Google searches for state accelerator list of potential US accelerators from crunchbase. Worked to find a way Looked through f6s for common words that can be used to classify distinguish accelerators just based on their descriptionsonce we have finalized the crawler.
411/21/2017: 12016 2:00-45:00 pm: Continued working *Randomly chose 10 accelerators from Excel list of accelerators on the RDP. Went through each website and listed the list identifying accelerators steps that we do I took in order to determine whether or not havethe website belonged to an accelerator. Ramee and Juliette are now helping us gather Will continue extracting cohort data for those missing acceleratorsinformation tomorrow.
411/2422/2017 92016 4:00-16:00 pm: Updated Veeral on current state of project*Listed out all steps for extracting cohort information from the ten randomly chosen accelerators. Typed up Worked with Peter in order to build a tool that will search all of the HTMLs and attempt to-do list on the discussion wiki for Veeral. Got new cohort data on identify each one as an accelerator and added it to Excel fileas well as extract some basic information.
511/328/2017 112016 2:00-15:00 pm: Talked to Ed and Anne about future report. Continued working through *Merged the F6S accelerator list with our other list of crunchbase potential accelerators, then posted it on the project page. Last day of work Learned process for this semesteraccelerator data extraction from Ed.
911/1129/2017 22016 4:00-56:00 pm: Spoke to Ed about *Began process of collecting data from the project going forward. Organized the current updated data 20 accelerators that I am responsible for our project.
911/1230/2017 32016 2:00-5:00 pm: Began going through the Cleaned Cohort Data Excel file and found a few problems with it*Continued collecting data from accelerators. Will continue the cleaning process for the rest of the weekFinished 15/20.
912/131/2017 22016 4:00-56:00 pm: Sorted through Cleaned Cohort Data and finalized our List of Accelerators*Continued collecting data from accelerators. We can begin the process Finished original 20, picked up a new set of creating our PercentVC table20.
912/142/2017 32016 2:00-5:00 pm*Continued collecting data from accelerators. Finished next 20. 12/8/2016 1:00-3: Completely finalized our 00 pm*Completed collecting data set of from accelerators and startups. Met with Michelle Passo to discuss objectives of for the research for credit coursesemester.
[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Work Log)|(log page)]]
[[Category:Work Log]]

Navigation menu