Changes

Jump to navigation Jump to search
no edit summary
===Fall 2017===
<onlyinclude>
[[Category:Matthew Ringheanu]] [[Work LogLogs]][[Category:InternalMatthew Ringheanu (Work Log)|(log page)]] 9/11/2017 2:00-5:00 pm*Spoke to Ed about the project going forward. Organized the current updated data for our project. 9/12/2017 3:00-5:00 pm*Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week. 9/13/2017 2:00-5:00 pm*Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table. 9/14/2017 3:00-5:00 pm*Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research for credit course. 9/18/2017 2:00-4:00 pm*Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me. 9/19/2017 3:00-5:00 pm*Completed SDC pull of updated VC Data. 9/20/2017 2:00-5:00 pm*Attempted several times to run the Matcher. Cleaned our pulled data. 9/21/2017 3:00-5:00 pm*Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter. 9/25/2017 2:00-5:00 pm*Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia accelerators. 9/26/2017 3:00-5:00 pm*Worked on finding the duplicates in our Matched file in order to have the most accurate data. 9/27/2017 2:00-5:00 pm*Attempted to find a way to organize the duplicate matches. 9/28/2017 4:00-5:00 pm*Continued running through matched data in order to organize it effectively. 10/2/2017 2:00-5:00 pm*Talked to Ed about next steps for the project. Practiced accessing the crunchbase database on SQL. Brushed up on SQL code. 10/3/2017 3:00-5:00 pm*Searched the database for crunchbase investment information. 10/4/2017 2:00-5:00 pm*Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates. 10/6/2017 3:00-5:00 pm*Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators. 10/11/2017 2:00-3:30 pm:*Looked into using the WhoIs Parser in order to find when the companies went through their accelerators. 10/12/2017 3:00-5:00 pm*Discovered that the Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we find a good method of finding this date. 10/16/2017 2:00-3:30 pm*Continued working on sorting VCCompanies by their earliest round date. 10/17/2017 3:00-5:00 pm*Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies. 10/18/2017 2:00-5:00 pm*Updated our VC data with Ed's help in order to increase the accuracy and completion of our data. 10/19/2017 3:00-5:00 pm*Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies. 10/20/2017 2:00-3:30 pm*Generated the new list of VCCompanies as well as their earliest round dates. 10/23/2017 2:00-3:30 pm*Worked on sorting out the discrepancies in our matched data. 10/24/2017 3:00-5:00 pm*Went through list of VCCompanies and began adding respective accelerators in order to proceed with VCPercentage table. 10/25/2017 2:00-5:00 pm*Continued going through list of VCCompanies and adding accelerators. 10/26/2017 3:30-5:30 pm*Continued going through list of VCCompanies and adding accelerators. Will have this completed on Monday. 10/30/2017 2:00-3:30 pm*Finished adding all of the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators. 10/31/2017 3:00-5:00 pm*Began compiling data in the column for Date Company went through Accelerator. 11/1/2017 2:00-4:00 pm*Finalized entering dates for Y Combinator cohort companies. 11/2/2017 4:00-5:30 pm*Continued entering cohort company dates into Excel file. 11/6/2017 2:00-4:00 pm*Continued entering cohort company dates into Excel file. Began compiling a list of keywords for demo day press releases. 11/7/2017 3:00-5:00 pm*Finished coming up with keywords for demo day crawler. Sent the final list to Peter. 11/8/2017 2:00-3:30 pm*Spoke to Ed and organized all of our current data. 11/9/2017 3:00-5:00 pm*Created a new project page called Accelerator Data and listed all relevant files as well as descriptions. 11/14/2017 3:00-5:00 pm*Looked up URLs and decided whether or not the webiste was relevant. 11/15/2017 2:00-5:00 pm*Created SQL database entitled "acceleratordata" and began creating tables from folder of All Relevant Files. 11/16/2017 3:00-5:00 pm*Continued to input tables into SQL database. 11/20/2017 2:00-5:00 pm*Cleaned text files in order to import tables into SQL database. 11/27/2017 2:00-5:00 pm*Worked with Peter to find and exclude irrelevant keywords on HTML pages. Began categorizing relevant demo day pages. 11/28/2017 3:00-5:00 pm*Finished inputting tables of relevant files into SQL database. 11/29/2017 2:00-5:00 pm*Went through accelerator HTML URLs. Spoke with Ed about going through HTMLs and classifying based on overall and specific relevance. 12/1/2017 3:00-5:00 pm*Worked through accelerator links and classified pages based on whether or not they provided relevant information about startup timing. 12/4/2017 10:00-12:00 pm*Continued running through demo day crawl URLs and scoring them based on relevance. 12/7/2017 1:00-4:30 pm*Finalized scoring of demo day URLs for the original crawl. Last day of work for this semester. </onlyinclude> ===Spring 2017=== 1/18/2017 1:00-5:00 pm*Continued collecting data for accelerator project. Helped Catherine draft tweets for the McNair Center twitter account. 1/20/2017 1:00-3:00 pm*Continued collecting data on accelerators. Attended McNair Center team meeting. 1/23/2017 1:00-5:00 pm*Began combing through accelerator list, determining which accelerators are still missing data and documenting these in a TextPad file. Finished through #115. 1/25/2017 1:00-5:00 pm*Continued looking through accelerator list. 1/27/2017 1:00-3:00 pm*Continued going through accelerator list. Left off on #226 with Shrey. 1/20/2017 1:00-5:00 pm*Continued going through accelerator list. Finished through #440. 2/1/2017 1:00-5:00 pm*Finished going through the list of accelerators looking for incomplete files. Began completing the files that were not done. 2/3/2017 1:00-3:00 pm*Continued working on completing accelerator files. 2/6/2017 1:00-4:30 pm*Finished data set of accelerators. Began going through and making sure that all text files and cohort files are of the same format so Peter can easily pull the information. Left for 30 minutes for an interview from 2:30-3:00 pm. 2/8/2017 1:00-5:00 pm*Finished formatting through #137. Spoke with Ed about project. 2/13/2017 1:00-5:00 pm*Completed formatting for all accelerator text files. 2/15/2017 3:00-5:00 pm*Made copy of the completed data set. Spoke to Ed about future steps to take for project including gathering founder data and obtaining the crunchbase api. 2/17/2017 1:00-3:00 pm*Went through final Excel spreadsheet for cohort information. Still need to run the crawler one more time after the completion of the editing process. Found the application for the crunchbase api which will hopefully allow us to gain access. 2/20/2017 1:00-5:00 pm*Filled out another application for Crunchbase research access; Found the first source for the incubator project on angel.co, will hopefully work with Peter to make a crawler similar to f6s 2/22/2017 1:00-5:00 pm*Pulled data from SDC for Ed and normalized it. Learned how to use SDC and the normalizer. 2/24/2017 1:00-3:00 pm*Finished cleaning up the cohort data for Y-combinator on the Final Cohort Excel Spreadsheet. 2/27/2017 1:00-5:00 pm*Continued cleaning up the cohort data in the Excel file. Finished Cohort Number and Year. 3/1/2017 2:00-5:00 pm*Worked with Ben and Shrey to pull data from SDC for all VC funded companies and normalized it to put it in an Excel document.
103/173/2016 22017 1:00-52:00 30 pm: Created personal wiki page as well as work log; Read about *Worked with Ben to try and repeat down the research project to which I have been assigned; Wrote a short summary of what I believe VC data without it is and included some helpful linksgoing too far.
103/186/2016 42017 1:00-64:00 pm: Met *Worked with research partner Shrey who filled me in on where we are with to finish cleaning the project; Began looking on websites of certain accelerators for how cohort data. It is ready to determine their cohorts and listed these steps on be run through the wikimatcher with Ben.
103/198/2016 22017 1:00-5:00 pm: Finished looking on *Matched the VC Data with the remaining accelerator websites list of Cohort Companies and wrote the steps on determining how to manually locate the cohortsgot one list of all cohort companies that have received VC funding.
3/10/20/2016 42017 12:00-62:00 pm: Met with Peter and Christy to discuss *Put a write-up on the possibility top of creating a web crawler that will pull the Accelerator wiki page detailing where we are in the project currently as well as what data from individual accelerator siteswe have accumulated on the RDP.
103/2420/2016 22017 1:00-5:00 pm: Brainstormed with Albert and Julia about changes to *Began gathering the category name for SBDEURLs of all accelerators in a TextPad file called Accelerator URLs. Spoke to Ed about full scope of accelerator projectParticipated in the SQL training session.
103/2522/2016 42017 1:00-65:00 pm: Brainstormed *Made tables in Terminal for Accelerator companies matched with Shrey about different potential industry focuses within accelerators, as well as different variables to search VC companies and for in terms of accelerators, startups, cohorts, etcCohort Data.
103/2627/2016 22017 1:00-54:00 pm: Began searching for more databases including lists of accelerators as well as some characteristics *Compiled all URLs of those accelerators; Began searching for characteristics that identify accelerators on their websitesaccelerator into a TextPad file.
103/2729/2016 42017 1:00-65:00 pm: Continued searching for relevant lists of accelerators to include *Worked on our pagethe matched data with Ben. Added some links Next time I will run the RegEx code that have high potential under will filter the tab (Obtained from List of Accelerators or various Google searches)URLs, and I will look through the duplicates where two different VC backed company names matched to one cohort company name.
103/31/2016 22017 1:00-52:00 pm: Began constructing a list of variables that clearly distinguish an *Ran the code for accelerator on its website. This is urls which are ready to be run through the wayback machine in an effort order to allow a crawler to crawl get the start dates. Also began looking through many Google searches and identify acceleratorsvc backed company names.
114/3/2017 1/2016 4:00-65:00 pm: *Continued looking for variables that could identify accelerators from their websitesthrough double matched VC companies. Searched through numerous different websites of accelerators obtained Learned more SQL from our current databasesEd.
114/25/2016 22017 1:00-45:00 pm: Continued combing through websites of numerous accelerators, well-known *Made the final vc percentage table on terminal and other, in the hopes of finding identifying variablesfor next time I will collect missing accelerator data.
114/37/2016 42017 1:00-63:00 pm: Finalized my list of variables *Began collecting cohort data for big accelerators that could be used to distinguish the websites of accelerators. Slightly re-arranged were missing from our list of accelerator databases in order to add it to our final list of relevancecohort companies.
114/710/2016 22017 1:00-5:00 pm: Began compiling *Finished gathering cohort company names for big accelerators that we were missing and put them into the list of all Cleaned Cohort Companies Excel file. Ben is looking through Crunchbase data in order to possibly find more missing accelerators. Created a new TextPad document with information from a new database.
114/814/2016 42017 1:00-64:00 pm: Worked with Shrey *Began working through "Crunchbase Potential Accelerators" textpad that may contain missing accelerators and Ben in order wrote notes on the ones that I was able to compile all of our accelerator databases into one long list on Textpadgo through. Need to finish this textpad before moving forward.
114/917/2016 22017 1:00-54:00 pm: *Continued formulating going through potential Crunchbase accelerators that we may have missed. Talked to Ed about getting a database for all accelerators more comprehensive list from Excel file and all by the end of the available info givensemester have the tables and data collected and done.
114/1019/2016 42017 1:00-64:00 pm: *Worked with Shrey and Peter in order Jeemin to develop generate an entire list of potential US accelerators from crunchbase. Worked to find a crawler for f6sway to classify accelerators just based on their descriptions.
114/1421/2016 22017: 1:00-54:00 pm: Began sorting *Continued working through the Seed-DB database in an Excel documentlist identifying accelerators that we do not have. Ramee and Juliette are now helping us gather cohort data for those missing accelerators.
114/1524/2016 42017 9:00-61:00 pm: Conducted some Google searches in *Updated Veeral on current state of project. Typed up a to-do list on the discussion wiki for Veeral. Got new cohort data on an attempt accelerator and added it to find more accelerator databases. Began looking through Executive Orders searching for keywordsExcel file.
115/163/2016 22017 11:00-51:00 pm: Completed searching *Talked to Ed and Anne about future report. Continued working through Executive Orderslist of crunchbase potential accelerators. Last day of work for this semester.
11/17/===Fall 2016 4:00-6:00 pm: Continued working on Google searches for state accelerator list. Looked through f6s for common words that can be used to distinguish accelerators once we have finalized the crawler.===
1110/2117/2016 2:00-5:00 pm: Randomly chose 10 accelerators from Excel list *Created personal wiki page as well as work log; Read about the research project to which I have been assigned; Wrote a short summary of accelerators on the RDP. Went through each website what I believe it is and listed the steps that I took in order to determine whether or not the website belonged to an accelerator. Will continue extracting cohort information tomorrow.included some helpful links
1110/2218/2016 4:00-6:00 pm: Listed out all steps for extracting cohort information from *Met with research partner Shrey who filled me in on where we are with the ten randomly chosen project; Began looking on websites of certain accelerators. Worked with Peter in order for how to build a tool that will search all of determine their cohorts and listed these steps on the HTMLs and attempt to identify each one as an accelerator as well as extract some basic information.wiki
1110/2819/2016 2:00-5:00 pm: Merged *Finished looking on the F6S remaining accelerator list with our other list, then posted it websites and wrote the steps on determining how to manually locate the project page. Learned process for accelerator data extraction from Edcohorts.
1110/2920/2016 4:00-6:00 pm: Began process *Met with Peter and Christy to discuss the possibility of collecting creating a web crawler that will pull data from the 20 accelerators that I am responsible forindividual accelerator sites.
1110/3024/2016 2:00-5:00 pm: Continued collecting data from accelerators*Brainstormed with Albert and Julia about changes to the category name for SBDE. Finished 15/20Spoke to Ed about full scope of accelerator project.
1210/125/2016 4:00-6:00 pm: Continued collecting data from *Brainstormed with Shrey about different potential industry focuses within accelerators. Finished original 20, picked up a new set as well as different variables to search for in terms of 20accelerators, startups, cohorts, etc.
1210/226/2016 2:00-5:00 pm: Continued collecting data from *Began searching for more databases including lists of accelerators as well as some characteristics of those accelerators; Began searching for characteristics that identify accelerators. Finished next 20.on their websites
1210/827/2016 14:00-36:00 pm: Completed collecting data from *Continued searching for relevant lists of accelerators for to include on our page. Added some links that have high potential under the semestertab (Obtained from List of Accelerators or various Google searches).
110/1831/2017 12016 2:00-5:00 pm: Continued collecting data for *Began constructing a list of variables that clearly distinguish an accelerator projecton its website. Helped Catherine draft tweets for the McNair Center twitter accountThis is in an effort to allow a crawler to crawl through many Google searches and identify accelerators.
11/1/20/2017 12016 4:00-36:00 pm: *Continued collecting data on looking for variables that could identify acceleratorsfrom their websites. Attended McNair Center team meetingSearched through numerous different websites of accelerators obtained from our current databases.
111/232/2017 12016 2:00-54:00 pm: Began *Continued combing through accelerator listwebsites of numerous accelerators, determining which accelerators are still missing data well-known and documenting these other, in a TextPad file. Finished through #115the hopes of finding identifying variables.
111/253/2017 12016 4:00-56:00 pm: Continued looking through *Finalized my list of variables that could be used to distinguish the websites of accelerators. Slightly re-arranged our list of accelerator listdatabases in order of relevance.
111/277/2017 12016 2:00-35:00 pm: Continued going through accelerator *Began compiling the listof all accelerators. Left off on #226 Created a new TextPad document with Shreyinformation from a new database.
111/208/2017 12016 4:00-56:00 pm: Continued going through *Worked with Shrey and Ben in order to compile all of our accelerator databases into one long list. Finished through #440on Textpad.
211/19/2017 12016 2:00-5:00 pm: Finished going through the list *Continued formulating a database for all accelerators and all of accelerators looking for incomplete files. Began completing the files that were not doneavailable info given.
211/310/2017 12016 4:00-36:00 pm: Continued working on completing accelerator files*Worked with Shrey and Peter in order to develop a crawler for f6s.
211/614/2017 12016 2:00-45:30 00 pm: Finished data set of accelerators. *Began going through and making sure that all text files and cohort files are of sorting the same format so Peter can easily pull the information. Left for 30 minutes for Seed-DB database in an interview from 2:30-3:00 pmExcel document.
211/815/2017 12016 4:00-56:00 pm: Finished formatting *Conducted some Google searches in an attempt to find more accelerator databases. Began looking through #137. Spoke with Ed about projectExecutive Orders searching for keywords.
211/1316/2017 12016 2:00-5:00 pm: *Completed formatting for all accelerator text filessearching through Executive Orders.
211/1517/2017 32016 4:00-56:00 pm: Made copy of the completed data set*Continued working on Google searches for state accelerator list. Spoke Looked through f6s for common words that can be used to Ed about future steps to take for project including gathering founder data and obtaining distinguish accelerators once we have finalized the crunchbase apicrawler.
211/1721/2017 12016 2:00-35:00 pm: Went through final *Randomly chose 10 accelerators from Excel spreadsheet for cohort information. Still need to run the crawler one more time after the completion list of accelerators on the editing processRDP. Found Went through each website and listed the application for steps that I took in order to determine whether or not the crunchbase api which will hopefully allow us website belonged to gain accessan accelerator. Will continue extracting cohort information tomorrow.
211/2022/2017 12016 4:00-56:00 pm: Filled *Listed out another application for Crunchbase research access; Found the first source all steps for extracting cohort information from the incubator project on angelten randomly chosen accelerators.co, will hopefully work Worked with Peter in order to make build a crawler similar tool that will search all of the HTMLs and attempt to f6sidentify each one as an accelerator as well as extract some basic information.
211/2228/2017 12016 2:00-5:00 pm: Pulled *Merged the F6S accelerator list with our other list, then posted it on the project page. Learned process for accelerator data extraction from SDC for Ed and normalized it. Learned how to use SDC and the normalizer.
211/2429/2017 12016 4:00-36:00 pm: Finished cleaning up *Began process of collecting data from the cohort data 20 accelerators that I am responsible for Y-combinator on the Final Cohort Excel Spreadsheet.
211/2730/2017 12016 2:00-5:00 pm: *Continued cleaning up the cohort collecting data in the Excel filefrom accelerators. Finished Cohort Number and Year15/20.
312/1/2017 22016 4:00-56:00 pm: Worked with Ben and Shrey to pull *Continued collecting data from SDC for all VC funded companies and normalized it to put it in an Excel documentaccelerators. Finished original 20, picked up a new set of 20.
312/32/2017 12016 2:00-25:30 00 pm: Worked with Ben to try and repeat down the VC *Continued collecting data without it going too farfrom accelerators. Finished next 20.
312/68/2017 2016 1:00-43:00 pm: Worked with Shrey to finish cleaning the cohort *Completed collecting data. It is ready to be run through from accelerators for the matcher with Bensemester.
[[Matthew Ringheanu]] [[Work Logs]] [[Matthew Ringheanu (Category:Work Log)|(log page)]]

Navigation menu