Changes

Jump to navigation Jump to search
no edit summary
{{Project|Has project output=Content|Has sponsor=McNair ProjectsCenter|Project TitleHas title=Women in Entrepreneurship (Issue Brief)|Topic Area=Social Factors in Entrepreneurship|OwnerHas owner=Carlin Cherry|Start TermHas start date=Spring 2016|End TermHas keywords=Women, Entrepreneurship|StatusHas project status=Tabled|Deliverable}}=Timeline and plan for summer=*note: I am using "hopethisworks8.txt" to make my graphs*4-5 artifacts, 4-5 specifications, last artifact a regression table (tables, charts)**something about data itself. pie chart (how many companies do we have, how many do we have ceo information for, how many do we have founder) and then relatedly many can we classify as men or women***this will influence all remaining graphs - use this info to guide the rest of your graphs***classified drs using their first name matched to common first name list, that's why we were able to classify some drs as women, some as men, and some drs but gender unknown**graph with time on x axis, % on y axis, women founders, women ceos, women management positions (management position being anything that is vp and above) over time***will write up these results-ie most of the women in "womens management positions" **conditional on having (ceo identified) how many women, how many men, drs, dr men, dr women, dr unknown. **bar graph with women in various industries, y axis will have percentage***should also make note of total percentage of VC dollars in each industry**regression table***variable names, with n and r^2, industry/year fixed effects Y/N***each one of these will correspond to a regression***variables: IPO, acquisition, exit all correspond to 0,1***rounds, dollars invested =Spring 2017=Issue Brief|Audience=General Public=Current plan to complete project==#Lit Review (which will be compiled here [[Women in Entrepreneurship Lit Review]]). Keep track of the things that have already been done in other research reports, so that we can do something different/original or so we can prove their data wrong. Done 3/28.#Brainstorm ways to fix Dr. problem in data (assigning gender to those who are doctors), Policymakersrecode them. reran new tables with new coded doctors. done 4/4.#Think of new variables to add depending on the guidance from the lit review (might include adding different variables than solely womenceo/womenfounder but will depend on litreview). ##add control variable for the people in data who have no gender#I think from there I can do analysis and start to write report! ==For analysis==
|KeywordsNote that:*finaldatasetcode.sql (in 181/Women) has been updated*hopethisworks5.txt (in both 181/Women and E/Women) has been created**Use this as your dataset now**It contains IPO, IPOyear, IPOAmount, MA, MAyear, MAAmount, and Exit**IPO, MA, and Exit are 0/1 variables. Year is an int, and Amount is a real.**Many MAs will not have amounts. Build graphs of:*Over time (last ten yrs)**percent of co's with woman ceo (binary)**percent of co's with woman founder (binary)**percent of co's with woman clevel (binary)**percent of co's with women in top management (vp and above)**average fraction of women founders**average fraction of women clevel**average fraction of women in top management (vp and above)*Some of the above by industry*Some of the above by state Regresions:*rounds womanvar w/controls*inv womanvar w/controls*exit womanvar w/controls*ipo womanvar w/controls*ma womanvar w/controls*ipoamount womanvar w/controls*maamount womanvar w/controls ==WomenCall with SDC Platinum to determine information about the data==#How does Thompson get their data?##Why is coverage better for some firms than others?##Is the data self-reported by the companies?#How does Thompson upload executive data?##How often – after last round of funding?##After it’s uploaded, is the data updated continuously?#Better coverage for companies that get more rounds of financing/IPO/M&A?  #VC data sourced from government filings, public press releases, and quarterly surveys of private equity firms. If company does not participate in survey, then SDC does not have the data, which is why coverage for some firms is better than others. #Deals team is the one that uploads all the data. As soon as SDC gets an update, or has a source that is updated, they automatically upload that to system as well. It typically takes 24-48 hours for new info to be reflected in database. #Yes, there is better coverage for companies that have IPO/M&A/get more rounds of financing. ==TO DO after 2/28==#Check cleancos and verify that it is actually clean (yes)#Make Dr. a control variable (yes)#Sum all c-level, etc.#aggregate to company level We need to build:*IPO 1/0*M&A 1/0*Number of rounds*Total invested*Is the CEO a woman 1/0*Are any of the founders women 1/0*Is CEO doctor 1/0*Number of founders*Number of founders that are doctors*Number of women founders*State - coded*Industry -code them!*Year of First investment -extracted Ed is going to add IPO and Acq to rc1andcp3 Here's the code to add zeros to cleanpeople: DROP TABLE cleanpeople2; CREATE TABLE cleanpeople2 AS SELECT prefix, firstname, lastname, jobtitle, fulltitle, cname, CASE WHEN genval IS NOT NULL THEN genval ELSE 0::int END AS women, CASE WHEN doctor IS NOT NULL THEN doctor ELSE 0::int END AS doctor, CASE WHEN charman IS NOT NULL THEN charman ELSE 0::int END AS charman, CASE WHEN ceo IS NOT NULL THEN ceo ELSE 0::int END AS ceo, CASE WHEN cfo IS NOT NULL THEN cfo ELSE 0::int END AS cfo, CASE WHEN coo IS NOT NULL THEN coo ELSE 0::int END AS coo, CASE WHEN cio IS NOT NULL THEN cio ELSE 0::int END AS cio, CASE WHEN cto IS NOT NULL THEN cto ELSE 0::int END AS cto, CASE WHEN otherclvl IS NOT NULL THEN otherclvl ELSE 0::int END AS otherclvl, CASE WHEN boardmember IS NOT NULL THEN boardmember ELSE 0::int END AS boardmember, CASE WHEN president IS NOT NULL THEN president ELSE 0::int END AS president, CASE WHEN vp IS NOT NULL THEN vp ELSE 0::int END AS vp, CASE WHEN founder IS NOT NULL THEN founder ELSE 0::int END AS founder, CASE WHEN director IS NOT NULL THEN director ELSE 0::int END AS director FROM cleanpeople WHERE jobtitle IS NOT NULL; ==TO DO up to 2/28== #Load##company##people##title lookup##state lookup#Refine people##gender 0/1##Join title lookup##Dr.'s?#Aggregate to company level (left join)##Agg. People##Join state lookup#Export! Done! Here is the code for this part of project: DROP TABLE titlelookup;CREATE TABLE titlelookup( fulltitle varchar(150), charman int, ceo int, cfo int, coo int, cio int, cto int, otherclvl int, boardmember int, president int, vp int, founder int, director int);\COPY titlelookup FROM 'Important Titles in Women2017 dataset.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV--628  DROP TABLE statelookup;CREATE TABLE statelookup( statename varchar(100), uniquecode int);\COPY statelookup FROM 'uniquestateval.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV--50  DROP TABLE people;CREATE TABLE people( prefix varchar(5), firstname varchar(50), lastname varchar(50), jobtitle varchar(150), cname varchar(150) );\COPY people FROM 'pull5-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV--186181  DROP TABLE cleancos;CREATE TABLE cleancos ASSELECT pull3info.*FROM pull3infoJOIN Cleancosbase ON pull3info.companyname=cleancosbase.companyname AND minfirstdate=firstinvestdate;--43534  DROP TABLE genvalpeople;CREATE TABLE genvalpeople ASSELECT *,CASE WHEN prefix='Ms' THEN 1::intWHEN prefix='Mrs' THEN 1::intWHEN prefix='Mr' THEN 0::intELSE Null::int END AS genval FROM people;--186181 DROP TABLE cleanpeople;CREATE TABLE cleanpeople AS SELECT genvalpeople.*, titlelookup.*FROM genvalpeople LEFT JOIN titlelookup ON genvalpeople.jobtitle=fulltitle;--186181 DROP TABLE uniquevalstate;CREATE TABLE uniquevalstate ASSELECT cleancos.*, statelookup.*FROM cleancos LEFT JOIN statelookup ON cleancos.companystate=statelookup.statename;--43534  DROP TABLE dataset;CREATE TABLE dataset ASSELECT cleanpeople.*, Entrepreneurshipuniquevalstate.*|Primary BillingFROM cleanpeople LEFT JOIN uniquevalstate ON cleanpeople.cname=AccMcNair01uniquevalstate.companyname;--186181 \COPY people TO 'finaldataset.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV}}--186181
[[Women in Entrepreneurship Lit Review]]==Paper==https://docs.google.com/document/d/1RHC0Namt9VpmUxINmRvHw4U34dDmgYPcUMbwhhthcCA/edit
=Spring 2017===To-doTimeline==
*Prior
**review/update lit review
**state fix-effects
*Policy
**Trump's proposed policy changes that will impact vc-backed companies, entrepreneurship/innovation in general, women in the workplace: research can be found at [[Trump and Women in Entrepreneurship]]***crowd-funding impact
==Timeline==
*1/24 - finish outline, reread tables/data collection process, update lit review
*1/25 - SDC pull,  *1/30 - rerun through tables w new added variables (i.e. industry, etc)*1/27 31 - read over new data and refine into something presentable, find appropriate parameters for issue brief, etc. maybe start making charts and graphs if time. @ this point check in w ed*2/1 - finish making graphs and charts, research trump policy and write it up*2/3 - start finalizing paper, putting all the pieces of the puzzle together, any additions or time to fix things that went off schedule, last edits, submit paper to anne for review ==SDC Search=====pull 1===*Database Selection**VentureXpert**Portfolio Companies *Search Items:**Date that received company received first investment: 01/01/1980 to 01/01/2017**venture-related deals**Company Nation: U.S. *Custom Report**Company Founding Date**Company IPO Date**Date Received Last Investment**Date Received First Investment**Company Name**Company State**Company City**Company VE Primary Industry Class**Company VE Primary Industry Major Group**Company 6-digit CUSIP**Total amt invested in company**No. Financing Rounds Company Received**Executive's First Name**Executive's Last Name**Executive's Job Title**Executive's Prefix**VE Primary Industry Sub-Group 3**VE Primary Industry Minor Group**VE Primary Industry Sub-Group 1**Company IPO**Company Website  *Saved in 182 in mcnair/projects/women2017 as "pull1" ===pull 3===*Database Selection**VentureXpert**Portfolio Companies *Search Items:**Date that received company received first investment: 01/01/1980 to 01/01/2017**Venture related deals - select all venture related deals**Company Nation: U.S. *custom report*company founding date*company ipo date*date company received last investment*date company received first investment*company staet*company name*company industry class*company industry major group*company industry subgroup 1*company industry subgroup 3*Company industry minor group*company cusip*total known amt invested in company*no rouds company received investment*company ipo status*company website *saved in 182 in mcnair/projects/women2017 as "pull3" ===pull 4===*Database Selection**VentureXpert**Portfolio Companies *Search Items:**Date that received company received first investment: 01/01/1980 to 01/01/2017**Venture-related deals**Company Nation: U.S. *custom report*company name*executive's name prefix*executive's first name*executive's last name*executive's job title *saved in 182 in mcnair/projects/women2017 as "pull4" ===pull 5===*Database Selection**VentureXpert**Portfolio Companies *Search Items:**Date that received company received first investment: 01/01/1980 to 01/01/2017**Venture-related deals**Company Nation: U.S. *custom report*executive's name prefix*executive's first name*executive's last name*executive's job title*company name*saved in 182 in mcnair/projects/women2017 as "pull5" ===pull 6===*Database Selection**VentureXpert**Portfolio Companies *Search Items:**Date that received company received first investment: 01/01/1980 to 01/01/2017**venture-related deals**Company Nation: U.S. *Custom Report*executive's name prefix*executive's first name*executive's last name*executive's job title*company name*company city*company state ===pull 7===*Database Selection**VentureXpert**Portfolio Companies *Search Items:**Date that received company received first investment: 01/01/1980 to 01/01/2017**venture-related deals**Company Nation: U.S. *Custom Report*executive's name prefix*executive's first name*executive's last name*executive's job title*company name*Company ve primary industry major group*Company ve primary industry minor group*Company ve primary industry sub group 1*Company ve primary industry sub group 2*Company ve primary industry sub group 3  ==What I have done up to 2/7/17 (in english, not sql)== I have done many SDC pulls with many different variables. The most important tables are nowomenstate, nowomencity, nowomenminorind, and nowomensub2. nowomenstate shows the number of women in "power positions" in each state as a percentage of all power position workers. nowomencity does the same thing at a city level. nowomenminorind shows the number of women in power positions grouped by their company's minor industry group. nowomensub2 does the same except grouped by the company's industry sub group 2. ==Tables built up to 02/07/17== SDC pull #1company founding datecompany ipo datedate company received last investmentdate company received first investmentcompany staetcompany namecompany industry classcompany industry major groupcompany industry subgroup 1company industry subgroup 3Company industry minor groupcompany cusipttoal known amt invested in companyno rouds company received investmentcompany ipo statuscompany website SDC pull #2company nameexecutive's name prefixexecutive's first nameexecutive's last nameexecutive's job title   logging on:researcher@128.42.44.181password: 9million cd /bulkcd \Women2017psql Women2017 DROP TABLE pull3info;CREATE TABLE pull3info(/*taken from pull3.txt*/ ipodate date, lastinvestdate date, firstinvestdate date, companyfoundingdate date, ipostatus varchar(10), companystate varchar(50), companyname varchar(200), industryclass varchar(200), majorgroup varchar(200), minorgroup varchar(200), subgroup1 varchar(200), subgroup3 varchar(200), cusip varchar(6), totalinvested float, numrounds int, website varchar(200)); \COPY pull3info FROM 'pull3-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --COPY 43606 DROP TABLE cleancosbase;CREATE TABLE cleancosbase ASSELECT companyname, min(firstinvestdate) as minfirstdate FROM pull3info WHERE companyname !='Undisclosed Company' OR companyname !='New York Digital Health LLC' GROUP BY companyname;--43534 DROP TABLE cleancos;CREATE TABLE cleancos ASSELECT pull3info.*FROM pull3infoJOIN Cleancosbase ON pull3info.companyname=cleancosbase.companyname AND minfirstdate=firstinvestdate;--43534  DROP TABLE pull4info;CREATE TABLE pull4info(/*taken from pull4.txt*/ execfirstname varchar(50), jobtitle varchar(100), execlastname varchar(50), execprefix varchar(5), companyname varchar(200));
\COPY pull4info FROM 'pull4-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV /*the job title column of the exec is for some reason combined with the first nameeven after normalizing the data. I will repull the data from SDC to see if it wassimply an error in the pull or something. If that doesn't work, will go to ed.*/ SDC pull #5executive's name prefixexecutive's first nameexecutive's last nameexecutive's job titlecompany name DROP TABLE pull5info;CREATE TABLE pull5info(/*taken from pull5.txt*/ execprefix varchar(5), execfirstname varchar(50), execlastname varchar(50), jobtitle varchar(100), companyname varchar(200)); \COPY pull5info FROM 'pull5-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV --COPY 186181 DROP TABLE distinctnocompanies;CREATE TABLE distinctcompanies ASSELECT DISTINCT companyname FROM pull3info;/*created this per ed's advice to see how many companies are distinct from each otherso we dont get duplicates in the dataset - only really creating a table to get the specific number*/ --COPY 43536/*this means there are 70 duplicates in the data. could mean that some companies in diffstates have the same name, or it could be a for real duplicate. can go back and checkon this later but rn seems like a relatively nominal amt.*/  DROP TABLE totaltable;CREATE TABLE totaltable ASSELECT * FROM pull5info LEFT JOIN pull3info ON pull5info.CompanyName = pull3info.companyname;/*represents the table combining pull3info and pull5info, aka all of the info aboutthe companies and the executives' info matched on company name*/ /*WHERE IM AT: START GROUPING BY WOMEN IN pull3infoed says to rename pull3info but idc like i know what it is */ DROP TABLE commonnames;CREATE TABLE commonnames(/*common male and female names with corresponding 0 or 1*/ name varchar(100), mf int); DROP TABLE commonnamesclean;CREATE TABLE commonnamesclean AS SELECT name, sum(mf) AS mf FROM commonnames GROUP BY name HAVING count(mf) = 1; \COPY commonnames FROM 'realcommonnames.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV--4654 DROP TABLE impoppl;CREATE TABLE impoppl AS/30 *selecting the people with important job titles*/SELECT * FROM pull5info WHERE jobtitle IN ('Board Member', 'CEO', 'Founder', 'CFO', 'COO', 'Chairman', 'Chief Executive Officer', 'Chief Financial Officer', 'President', 'Treasurer', 'Vice President');- finish making graphs -67733 DROP TABLE binarymanwoman;CREATE TABLE binarymanwoman AS/*creates table with a column populated with 1 if prefix is Ms or Mrs and charts0 if mr*/SELECT*,CASE WHEN execprefix='Ms' THEN 1::intWHEN execprefix='Mrs' THEN 1::intWHEN execprefix='Mr' THEN 0::intELSE Null::int END AS binaryval FROM pull5info;--186181 DROP TABLE binarycommonmatch;CREATE TABLE binarycommonmatch AS/*matching common names to list of execs who have dr prefix/are gender unidentified*/SELECT execprefix, execfirstname, execlastname, jobtitle, companyname,CASE WHEN binaryval IS NULL AND mf IS NOT NULL THEN mfWHEN binaryval IS NOT NULL THEN binaryvalELSE NULL::int END AS binaryvalFROM binarymanwoman LEFT JOIN commonnamesclean ON execfirstname = nameWHERE execfirstname IS NOT NULL;--171307 DROP TABLE doctors;CREATE TABLE doctors AS/*represents the leaders from companies who haven't been assigned a gender*/SELECT * FROM binarycommonmatch WHERE binaryval IS NULL;--7016  DROP TABLE women;CREATE TABLE women ASSELECT * FROM binarycommonmatch WHERE binaryval=1;--18101/*this means that 18101/171307 of the people in "important" positions are womenwhich is equal to 0.105. Could be +/31 - research trump policy 0.04 bc of the doctors/unidentified gendersin the data*/ /*next steps: redo SDC pull, group by  statecitycompanyexec prefix, first name, last name, job title also check in w ed about the no of doctos and whether getting within a 4% margin isok*/ SDC pull #6 company statecompany citycompany nameexec first nameexec last nameexec prefixexec job title DROP TABLE execbylocation;CREATE TABLE execbylocation( state varchar(75), city varchar(75), companyname varchar(100), execjob varchar(100), execprefix varchar(5), firstname varchar(50), lastname varchar(50)); \COPY execbylocation FROM 'pull6-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV--186242  DROP TABLE impopplbylocation;CREATE TABLE impopplbylocation AS/*selecting the people with important job titles*/SELECT * FROM execbylocation WHERE execjob IN ('Board Member', 'CEO', 'Founder', 'CFO', 'COO', 'Chairman', 'Chief Executive Officer', 'Chief Financial Officer', 'President', 'Treasurer', 'Vice President');--67750  DROP TABLE impoppllocname;CREATE TABLE impoppllocname ASSELECT * FROM impopplbylocation WHERE execprefix IN ('Ms', 'Mrs');--5646 /*here's what I want to do: I have a table with all the (identifiable) women, and aseparate table with both men and women. I want to group by state in both, thendivide the no of females in state (impoppllocname)/total in state (impopplbylocation)where state=state (i.e. females in california in impoppllocname divided by total no of ppl from california in impopplbylocation) to get the percentage of femalesin each state with "important" positions. however, sql is a turd and write it upwon't let me groupby state for some odd reason in either of these tables. is there a more efficient way to do this than with two tables? perhaps. But idk! help!*/ DROP TABLE binpplbylocation;CREATE TABLE binpplbylocation AS/*creates table with a column populated with 1 if prefix is Ms or Mrs and 0 if mr*2/SELECT*,CASE WHEN execprefix='Ms' THEN 1::floatWHEN execprefix='Mrs' THEN 1 ::floatWHEN execprefix='Mr' THEN 0::floatELSE Null::int END AS binaryval FROM impopplbylocation;- start finalizing paper-67750 DROP TABLE nowomenstate;CREATE TABLE nowomenstate ASSELECT binpplbylocation.state, putting sum(binaryval)/count(binaryval) AS percentageFROM binpplbylocation GROUP BY binpplbylocation.state ORDER BY percentage ASC;/*this gives the no of women in each state/all the pieces ppl in a state to yield the percentage of women in each state who are females in power*/--54 (includes Guam, D.C., etc.)/*Wyoming the lowest, at 0, and Hawaii the highest, at 16.9%, Texas at 8.1%*/ DROP TABLE nowomencity;CREATE TABLE nowomencity ASSELECT binpplbylocation.city, sum(binaryval)/count(binaryval) AS percentageFROM binpplbylocation GROUP BY binpplbylocation.city ORDER BY percentage DESC;--2049 SELECT * FROM nowomencity WHERE city='San Antonio';--Houston has 7.8% of women in leadership positions in all of its companies--Austin has 9.1%--Dallas has 9.9%--San Antonio has 7.4% /*next do sdc pull to group by industry minor group 3, industry major group*/ SDC pull #7 Custom Reportexecutive's name prefixexecutive's first nameexecutive's last nameexecutive's job titlecompany nameCompany major groupcompany minor groupcompany sub group 1,2,3 perl NormalizeFixedWidth.pl -file="pull7.txt" DROP TABLE execbyindustry;CREATE TABLE execbyindustry( execprefix varchar(5), firstname varchar(50), lastname varchar(50), jobtitle varchar(50), companyname varchar(100), majorgroup varchar(100), minorgroup varchar(100), subgroup1 varchar(100), subgroup2 varchar(100), subgroup3 varchar(100)); \COPY execbyindustry FROM 'pull7-normal.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV--186272 DROP TABLE impopplbylocation2;CREATE TABLE impopplbylocation2 AS/*selecting the puzzle togetherpeople with important job titles*/SELECT * FROM execbyindustry WHERE jobtitle IN ('Board Member', 'CEO', 'Founder', 'CFO', 'COO', 'Chairman', 'Chief Executive Officer', 'Chief Financial Officer', 'President', 'Treasurer', any additions 'Vice President');--67764 DROP TABLE binpplbyindustry;CREATE TABLE binpplbyindustry AS/*creates table with a column populated with 1 if prefix is Ms or time Mrs and 0 if mr*/SELECT*,CASE WHEN execprefix='Ms' THEN 1::floatWHEN execprefix='Mrs' THEN 1::floatWHEN execprefix='Mr' THEN 0::floatELSE Null::int END AS binaryval FROM impopplbylocation2;--67764 DROP TABLE nowomenmajorind;CREATE TABLE nowomenmajorind ASSELECT binpplbyindustry.majorgroup, sum(binaryval)/count(binaryval) AS percentageFROM binpplbyindustry GROUP BY binpplbyindustry.majorgroupORDER BY percentage DESC;--6/*highest is biotechnology, with 11% of company founders being women. lowest issemiconductors/other electric, with 4%*/ DROP TABLE nowomenminorind;CREATE TABLE nowomenminorind ASSELECT binpplbyindustry.minorgroup, sum(binaryval)/count(binaryval) AS percentageFROM binpplbyindustry GROUP BY binpplbyindustry.minorgroupORDER BY percentage DESC;--10/*consumer related higher than biotechnology at 11.9% vs 11.5%, with lowest stillsemiconductors/other electric at 4%)*/ DROP TABLE nowomensub1;CREATE TABLE nowomensub1 ASSELECT binpplbyindustry.subgroup1, sum(binaryval)/count(binaryval) AS percentageFROM binpplbyindustry GROUP BY binpplbyindustry.subgroup1ORDER BY percentage DESC;--18/*pretty similar to fix minorind; not much difference/nothing revolutionary to add*/ DROP TABLE nowomensub2;CREATE TABLE nowomensub2 ASSELECT binpplbyindustry.subgroup2, sum(binaryval)/count(binaryval) AS percentageFROM binpplbyindustry GROUP BY binpplbyindustry.subgroup2ORDER BY percentage DESC;--69/*consumer products = 15%, consumer services = 14%, energy enhanced recovery = 14%,med/health services 13%, then biotech research at 13%. Lowest: industrial services at 2.5%, idustrial equipment at 2.9%, power supplies at 3.1%, laser related at 3.3%,optoelectronics at 3.6%*/ DROP TABLE nowomensub3;CREATE TABLE nowomensub3 ASSELECT binpplbyindustry.subgroup3, sum(binaryval)/count(binaryval) AS percentageFROM binpplbyindustry GROUP BY binpplbyindustry.subgroup3ORDER BY percentage ASC;--560/*industrial products=50%!, agriculture=33%, ecommerce selling products = 29%,biotech food enzymes and cultures = 29%, computers and hardware=28%. lowest: there are straight up 0 in about 75 different things that went off schedule.*/  /*up next: build table with coname, women ceo&founder (binary 0/1), women in power (0or 1), #power positions, #women in power, state, year co received first investment,no rounds, total$invested*/ ==STATA regression info== name text file "big.do"press play - will run in stata  commands in stata:summarize numrounds -> gives mean, skewness, kurtosissummarize numrounds, det -> same as above except w higher level of detailtab numrounds -> frequency countgen -> generate a variablegen invested = totalinvested*1000 = new variablegen abovemeaninv =0 if totalinvested <. (where . is equal to positive infinity, it's what stata puts in when there's a missing value)replace abovemeaninv=1 if totalivnested > mean & totalinvested <. reg some measure of performance on some measure of women reg numrounds womeninpower  if 2/3 (std. err.) < coef, you're goodThe relationship between numrounds and womeninpower is highly statistically significant - R-squared - last editssays of all the variation in the number of rounds, submit paper women in power only explains less than one percent of that variation 0.66%   if you use .log you can automatically replace it whenever you run the script  rebuild data set to anne for reviewcontain menneed year variable - extract year from date    multivariate regressiony should be perfromance measurex should be the explanatory variablesif you run multiple variables with , robust it will be fine i.year = indicator that takes value 1 if it's the year it's invested
=Spring 2016=
[[Category: McNair Projects]]
</includeonly><!-- flush --><!-- flush --><!-- flush -->
 
[[Category: Internal]]
[[Internal Classification: Issue Brief| ]]

Navigation menu