Changes

2,312 bytes added , 17:28, 3 August 2018

Grace:

*Process and Join in new timing data - new date located in Z:/accelerator/Formatted Timing Info.txt

*Make a category group to minorcode lookup- in E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code priority Grace.xlsx*Run WHOIS crawler on all valid URLs (not facebook pages, etc.)- Maxine did this*Founders Experience: Match Employers to VC funds/firms, VC backed startups (requires data from Augi) - added 2 columns to The File to Rule Them All (VC and VC start up)

==Accelerator Data Assembly Progress (Hira) ==

*All data files are in Z:/accelerator

*The SQL file that loads all data is: LoadAccData.sql. It is located in E:\McNair\Projects\Accelerators\Summer 2018.

=== Data assembly details ===

6) founders_experience - source file: "The File to Rule Them All - Founders experience sheet"

*Accelerator ~~text,~~ *First_Name ~~text,~~*Last_Name ~~text,~~ *Full_Name ~~text,~~ *Employer ~~text,~~ *VC ~~varchar(5),~~ *VC_backed_startup ~~varchar(5),~~*OLD_Job_Title ~~text,~~ *NEW_Job_Title ~~text,~~ *Dates_Employed ~~text,~~ *Time_Employed ~~text,~~ *Location ~~text,~~ *Extra_Description ~~text~~ 7) additional_timing_info - source file: "merging_work.xlxs" located in: E:\Projects\McNair\Seed DB8) additional_timing_info2 - source file: "formatted timing info2.txt" located in E:\Projects\McNair\Accelerators\Summer 2018. This was collected through MTurks. Tables 7 and 8 include columns:*coname*acceleratorname*cohort_name*date*month*year*season 9) timing_combined - This table combines all timing information we have and appends tables 4, 7 and 8.10) cohortcompanies_wtiming - merges data in tables cohortcompany and timing_combined ==Grace's Code=====format_timing.py=== E:/McNair/Projects/Accelerators/Summer 2018/format_timing.py Input: a txt file with accelerator mapped to multiple companies(in a single cell separated by columns or in separate rows) Output: txt file with companies mapped to accelerators with all the other information in the original file ===prioritycodecategory.py=== E:/McNair/Projects/Accelerators/Summer 2018/prioritycodecategory.py Input: txt file with a list of category groups (Column Y of Cohorts Final in The File to Rule Them All) Output: txt file with line number and minor code Final output: I took the txt file and copied the codes and pasted it into the added column Z of the Cohorts Final sheet from The File to Rule Them All. E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code priority Grace.xlsx Chooses code based on important from priority ranking dictionary before choosing arbitrarily. ===codecategory.py=== E:/McNair/Projects/Accelerators/Summer 2018/codecategory.py Input: txt file with a list of category groups (column Y of Cohorts Final in The File to Rule Them All) Output: txt file with line number and multiple minor codes Final output: I took the minor codes and copied them into column Z of this sheet (a copy of The File to Rule Them All with this added column) E:/McNair/Projects/Accelerators/Summer 2018/Cohorts Final - minor code (no priority) Grace.xlsx I arbitrarily chose the first code when multiple were given. I fixed this in excel by separating on commas. I also manually did a lot of them which is why there are mode values in this file than the one with priority.

GraceTan

108

edits

Changes

Seed Accelerator Data Assembly (view source)

Revision as of 17:28, 3 August 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools