Changes

2,716 bytes added , 19:53, 29 September 2020

no edit summary

~~'''09/15/16''':~~ ===Fall 2017===<onlyinclude>[[Christy Warden]] [[Work Logs]] [[Christy Warden (Work Log)|(log page)]]

~~''2~~2017-~~4:45''~~12-12: ~~Was introduced to the Wiki, built my page and was added to the RDP and Slack. Practiced basic Linux with Harsh and was introduced to the researchers.~~[[Scholar Crawler Main Program]] [[Accelerator Website Images]]

~~'''09/20/16'''~~ 2017-11-28: [[PTLR Webcrawler]] [[Internal Link Parser]]

~~''2~~2017-~~2:30:'' Was introduced to the DB server and how to access it/mount bulk drive in the RDP. 2~~11-21:~~30-3 Tried (and failed) to help Will upload his file to his database.~~ [[PTLR Webcrawler]]

~~''3~~2017-~~4:45~~09-21:'' Learned from Harsh how to transfer Will's file between machines so that he could access it for his table (FileZilla/ Putty, but really we should've just put it in the RDP mounted bulk drive we built at the beginning.)[[PTLR Webcrawler]]

~~'''~~2017-09~~/22/16'''~~-14: Ran into some problems with the scholar crawler. Cannot download pdfs easily since a lot of the links are not to PDFs they are to paid websites. Trying to adjust crawler to pick up as many pdfs as it can without having to do anything manually. Adjusted code so that it outputs tab delimited text rather than CSV and practiced on several articles.

~~''2~~2017-09-~~2:30~~12:Got started on Google Scholar Crawling. Found Harsh'~~' Labeled new supplies (USB ports)~~s code from last year and figured out how to run it on scholar queries. ~~Looked online for~~ Adjusted provided code to save the results of the query in a ~~solution to labeling~~ tab-delimited text file named after the query itself so that it can be found again in the ~~black ports, sent link with potentially useful supplies to Dr. Dayton~~future.

~~''2:30~~2017-09-311:~~''Went through all of~~ Barely started [[Ideas for CS Mentorship]] before getting introduced to my new project for the ~~new supplies plus monitors~~semester. Began by finding old code for pdf ripping, ~~desktops~~ implementing it and ~~mice) and created Excel sheet to keep track of them (Name, Quantity, SN, Link etc~~trying it out on a file.)

~~''3~~2017-309-07:~~15:'' Added~~ Reoriented myself with the Wiki and my previous projects. Met new team members. Began tracking down my former Wikis (they all seem pretty clear to me thus far about where to get my code for everything). Looking through my ~~hours~~ C drive to figure out where the ~~wiki Work Hours page, updated~~ pieces of code I have in my ~~Work Log~~personal directory belong in the real world (luckily I am a third degree offender only).</onlyinclude>

===Spring 2017=== '''091/18/17''' ''10-12:45'' Starting running old twitter programs and reviewing how they work. Automate.py is currently running and AutoFollower is in the process of being fixed. '''1/20/17''' ''10-11'' Worked on twitter programs. Added error handling for Automate.py and it appears to be working but I will check on Monday. ''11-11:15'' Talked with Ed about projects that will be done this semester and what I'll be working on. ''11:15 - 12'' Went through our code repository and made a second Wiki page documenting the changes since it has last been completed. http://mcnair.bakerinstitute.org/wiki/Software_Repository_Listing_2 ''12-12:45'' Worked on the smallest enclosing circle problem for location of startups. '''1/23/17''' ''10-12:45'' Worked on the enclosing circle problem. Wrote and completed a program which guarantees a perfect outcome but takes forever to run because it checks all possible outcomes. I would like to maybe rewrite it or improve it so that it outputs a good solution, but not necessarily a perfect one so that we can run the program on larger quantities of data. Also today I discussed the cohort data breakdown with Peter and checked through the twitter code. Automate.py seems to be working perfectly now, and I would like someone to go through the content with me so that I can filter it more effectively. Autofollower appears to be failing but not returning any sort of error code? I've run it a few different times and it always bottlenecks somewhere new, so I suspect some sort of data limiting on twitter is preventing this algorithm from working. Need to think of a new one. '''1/25/17''' ''10-12:45'' Simultaneously worked twitter and enclosing circle because they both have a long run time. I realized there was an error in my enclosing circle code which I have corrected and tested on several practice examples. I have some idea for how to speed up the algorithm when we run it on a really large input, but I need more info about what the actual data will look like. Also, the program runs much more quickly now that I corrected the error. For twitter, I discovered that the issues I am having lies somewhere in the follow API so for now, I've commented it out and am running the program minus the follow component to assure that everything else is working. So far, I have not seen any unusual behavior, but the program has a long wait period so it is taking a while to test. '''1/27/1617''' ''10-12:45'' So much twitter. Finally found the bug that has plagued the program (sleep_on_rate_limit should have been False). Program is now running on my dummy account, and I am going to check its progress on monday YAY.

~~''2-2:25:'' Read through the wiki page for the existing twitter crawler/example.~~

~~''Rest of time:'' Worked on adjusting our feeds for HootSuite and making the content on it relevant to the people writing the tweets/blogs. [[Christy Warden (Social Media)]]~~

~~[[Category:McNair Staff]]~~

~~This is a link to all of the things I did to the HootSuite and brainstorming about how to up our twitter~~'''2/~~social media~~3/~~blog presence.~~17'''

~~'''09/29/16'''~~

~~Everything I did is inside of my social media research page~~ ~~http://mcnair.bakerinstitute.org/wiki/Christy_Warden_~~# Patent Data (more people) and VC Data (~~Social_Media~~build dataset for paper classifier)~~I got the twitter crawler running~~ # US Universities patenting and ~~have created a plan~~ entrepreneurship programs (help w code for ~~how~~ identifying Universities and assigning to ~~generate a list of potential followers/ people worth following to increase our twitter interactions~~ patents) # Matching tool in Perl (fix, run??) # Collect details on Universities (look on wikipedia, download xml and ~~improve our feed to find stuff to retweet.~~process)# Maps issue

~~'''10/4/16'''~~(note - this was moved here by Ed from a page called "New Projects" that was deleted)

''~~11-12:30:~~'2/6/17''' ~~Directed people to the ambassador event.~~

~~''12:30-3:'' work~~ Worked on the classification based on description algorithm the whole time I was here. I was able to break down the new data so that the key words are all found and accounted for on ~~my crawler (~~a given set of data and so that I can ~~be read about on~~ go through a description and tag the words and output a matrix. Now I am trying to develop a way to generate the output I anticipate from the input matrix of tagged words. Tried MATLAB but I would have to buy a neural network package and I didn't realize until the end of the day. Now I am looking into writing my ~~social media page)~~ own neural network or finding a good python library to run.

~~''3~~http://scikit-~~4:45:''donald trump twitter data crawl~~learn.org/stable/modules/svm.html#svm

~~'''10/6/16'''~~going to try this on Wednesday

''12:15-4:45:'' Worked on the Twitter Crawler. It currently takes as input a name of a twitter user and returns the active twitter followers on their page most likely to engage with our content. I think my metric for what constitutes a potential follower needs adjusting and the code needs to be made cleaner and more helpful. Project is in Documents/Projects/Twitter Crawler in the RDP. More information and a link to the page about the current project is on my social media page [[Christy Warden (Social Media)]]

'''102/1817/1617'''

''1-2:30:''updated the information we have for the Donald Trump tweets. The data is in the Trump Tweets project in the bulk folder and should have his tweets up until this afternoon when I started working. ~~''2:30-5:''Continued (and completed a version~~ Comment section of) the twitter crawler. I have run numerous example users through the crawler and checked the outputs to see if the people I return are users that would be relevant to @BakerMcNair and generally they areIndustry Classifier wiki page. ~~[[Christy Warden (Social Media)]] for more information~~

~~''5 - 5:30:'' Started reading about the existing eventbrite crawler and am brainstorming ideas for how we could use it. (Maybe incorporate both twitter and eventbrite into one application?)~~

'''102/2520/1617'''

~~''12:15-4:45:''~~ Worked on ~~the Twitter Crawler. I am currently collecting~~ building a data ~~by following around 70-80 people while I am at work~~ table of long descriptions rather than short ones and ~~measuring the success of~~ started using this as the ~~follow so that I can adjust my program~~ input to ~~make optimal following decisions based on historical follow response~~industry classifier. ~~More info at [[Christy Warden (Social Media)]]~~

~~'''10/27/16'''~~

''~~12:15-3:~~'2/22/17''' First I ran a program that unfollowed all of the non-responders from my last follow spree and then I updated by datas about who followed us back. I cannot seem to see a pattern yet in the probability of someone following us back based on the parameters I am keeping track of, but hopefully we will be able to see something with more data. Last week we had 151 followers, at the beginning of today we had 175 follows and by the time that I am leaving (4:45) we have 190 followers. I think the program is working, but I hope the rate of growth increases.

~~''3-4'' SQL Learning~~ Finished code from above, ran numerous times with mild changes to data types (which takes forever) talked to Edand built an aggregation model.

~~''4-4:45'' Found a starter list of people to crawl for Tuesday, checked our stats and ran one more starting position through the crawler. Updated data sheets and worklog.~~

~~The log of who I've followed (and if they've followed back) are all on the twitter crawler page.~~

'''2/24/17'''

About to be done with industry classifier. Got 76% accuracy now, working on a file that can be used by non-comp sci people where you just type in the name of a file with a Company [tab] description format and it will output Company [tab] Industry. Worked on allowing this program to run without needing to rebuild the classification matrix every single time since I already know exactly what I'~~''11/1/16'''~~m training it on. Will be done today or Monday I anticipate.

''12:15 - 2:'' Unfollowed the non responders, followed about 100 people using the crawler. Updated my data sheets about how people have responded and added all the new followers to the log on [[Christy Warden (Social Media)]] twitter crawler page.

'''2~~-4:45~~/27/17''' Prepped the next application of my twitter crawling abilities, which is going to be a constantly running program on a dummy account which follows a bunch of new sources and dms the McNair account when something related to us shows up.

Classifier is done whooo! It runs much more quickly than anticipated due to the use of the python Pickle library (discovered by Peter) and I will document its use on the industry classifier page. (Done:

http://mcnair.bakerinstitute.org/wiki/Industry_Classifier).

I also looked through changes to Enclosing Circle and realized a stupid mistake which I corrected and debugged and now a circle run that used to take ten minutes takes seven seconds. It is ready to run as soon as Peter is done collecting data, although I'd like to think of a better way to test to make sure that these really are the optimal circles.

~~'''11/3/16'''~~

''~~12:15-12:30:~~'3/01/17' ~~I made a mistake today! I intended to fix a bug that occurred in my DM program, but accidentally started running a program before copying the program~~'s report about what went wrong so I could no longer access the error report. I am running the program again between now and Thursday and hoping to run into the same error so I can actually address it. (I believe it was something to do with a bad link). I did some research about catching and fixing exceptions in a program while still allowing it to continue, but I can'~~t really fix the program until I have a good example of what is going wrong.~~

~~''12:30 - 2:30:'' Unfollowed~~ Plotted some of the ~~non responders, followed about 100 people using the crawler~~geocoded data with Peter and troubleshooted remaining bugs. ~~Updated my data sheets about how people have responded~~ Met with Ed and ~~added all~~ discussed errors in the ~~new followers~~ geodata, which I need to ~~the log~~ go through and figure out how to fix. Worked on ~~[[Christy Warden (Social Media)]] twitter crawler page. I've noticed that our ratios~~ updating documentation of ~~successful returns of our follow are improving, I am unsure whether I am getting better at picking node accounts or whether our account is gaining legitimacy because our ratio is improving~~enclosing circles and related projects.

''2-4:15'' I had the idea after my DM program which runs constantly had (some) success, that I could make the follow crawler run constantly too? I started implementing a way to do this, but haven't had a chance to run or test it yet. This will present serious difficulties because I don't want to do anything that could potentially get us kicked off twitter/ lose my developer rights on our real account. It is hard to use a dummy acct for this purpose though, because nobody will follow back an empty account so it'll be hard to see if the program succeeds in that base case. I will contemplate tonight and work on it Thursday.

''~~4:15-4:30~~'3/06/17''' Started adding comments and print statements and some level of organization in my code in case other/future interns use it and I am not at work to explain how it functions. The code could definitely do with some cleanup, but I think that should probably come later after everything is functional and all of our twitter needs are met.

~~''4:30-4:45'' Updated work log~~ Worked on Enclosing Circle data and ~~put my thoughts on my social media project page~~started the geocoder which is running and should continue to run through Wednesday.

'''3/20/17'''

~~'''11/8/16'''~~Tried to debug Enclosing Circle with Peter. Talked through a Brute force algorithm with Ed, wrote explanation of Enclosing circle on Enclosing Circle wiki page and also wrote an English language explanation of a brute force algorithm.

~~''12:15-1'' Talked to Ed about my project and worked out a plan for the future of the twitter crawler. I will explain all of it on the social media page.~~

''~~1- 4:45~~'3/27/17''' ~~Worked on updating the crawler. It is going to take awhile but I made a lot of progress today and expect that it should be working (iffily) by next Thursday.~~

More debugging with Peter. Wrote code to remove subsumed circles and tested it. Discovered that we were including many duplicate points which was throwing off our results .

'''113/1029/1617'''

~~''12:15 - 4:45''~~ Tried to ~~fix bug in my retweeting crawler, but still haven't found it. I am going to keep running the program until the error comes~~ set up ~~and then log into the RDP as soon as I notice and copy down the error. Worked on changes to the crawler which will allow~~ an IDE for ~~automation~~rewriting enclosing circle in C.

'''113/1531/1617'''

~~''12:15 - 1:~~Finally got the IDE set up after many youtube tutorials and sacrifices to the computer gods. It is a 30~~'' Changing twitter crawler~~day trial so I need to check with Ed about if a student license is a thing we can use or not for after that. Spent time familiarizing myself with the IDE and writing some toy programs. Tried to start writing my circle algorithm in C and realized that this is an overwhelming endeavor because I used many data structures that are not supported by C at all. I think that I could eventually get it working if given a ton of time but the odds are slim on it happening in the near future. Because of this, I started reading about some programs that take in python code and optimize parts of it using C which might be helpful (Psyco is the one I was looking at). Will talk to Ed and Peter on Monday.

''1:30 - 4:45'' Worked on pulling all the data for the executive orders and bills with Peter (we built a script in anticipation of Harsh gathering the data from GovTrack which will build a tsv of the data)

'''04/03/17'''

~~'''11/17/16'''~~[[Matching Entrepreneurs to VCs]]

''~~12:15 - 1:30~~'04/10/17''' ~~Changing twitter crawler~~

''1:30 - 5:30'' Fixed the script Peter and I wrote because the data Harsh gathered ended up being in a slightly different form than what we anticipated. Peter built and debugged a crawler to pull all of the executive orders and I debugged the tsv output. I stayed late while the program ran on Harsh's data to ensure no bugs and discovered at the very very end of the run that there was a minor bug. Fixed it and then left.Same as above

'''11-4/2212/1617'''

''12:15- 2'' Worked on updating the crawler so that it runs automatically. Ran into some issues because we changed from python 2.7 to anaconda, but got those running again. Started the retweeter crawler, seems to be working well. Same as above

''~~2-2:30~~'04/17/17''' ~~Redid the Bill.txt data for the adjusted regexes. Met with Harsh, Ed and Peter about being better at communicating our projects and code.~~

~~''2:30-4:30'' Back~~ Same as above + back to ~~the twitter crawler~~Enclosing circle algorithm. I am ~~now officially testing it before we use~~ trying to make it ~~on our main account and have found some bugs with data collection~~ so that ~~have been adjusted. I realized at~~ the ~~very end of~~ next point chosen for any given circle is the ~~day that I have a logical flaw in my code that needs~~ point closest to its center, not to ~~be adjusted because only 1 person at a time goes into~~ the ~~people~~ original point that we ~~followed list. Basically, because of this, we will only be following one person in every 24 hour period. When I get back~~ cast the circle from ~~Thanksgiving, I need to change the unfollow someone function~~. ~~The new idea is that~~ I ~~will follow everyone~~ am running into some issues with debugging that ~~comes out of a source node, and then call the unfollow function for as long as it will run for while maintaining the condition that the top person on the list was followed for more than one day.~~ I will ~~likely need only one more day~~ be able to ~~finish this program before it can start running on our account~~solve soon.

''~~4:30 - 4:45~~'04/26/17''' ~~In response to the "start communicating with the comp people" talk, I updated my wiki pages and work log on which I have been heavily slacking.~~

Debugged new enclosing circle algorithm. I think that it works but I will be testing and plotting with it tomorrow. Took notes in the enclosing circle page.

~~'''11/29/16'''~~

''~~12:15- 1:45~~'04/27/17''' ~~Fixed code and reran it for gov track project, documented on E&I governance~~

~~''1:45- 2'' Had accelerator project explained~~ PROBLEM! In fixing the enclosing circle algorithm, I discovered a problem in one of the ways Peter and I had sped up the program, which lead the algorithm to methe wrong computations and completely false runtime. The new algorithm runs for an extremely long time and does not seem feasible to use for our previous application. I am looking into ways to speed it up, but it does not look good.

''~~2 - 2:30~~'04/28/17''' ~~Built histograms of govtrack data with Ed and Albert, reran data for Albert.~~

~~''2:30-4:45'' Completed first 5 reports (40-45)~~ Posted thoughts and updates on ~~accelerators (accidentally did number 20 as well)~~the enclosing circle page.

'''1205/101/1617'''

~~''12:15- 3'' Fixed the perl code that gets a list of all Bills that have been passed, then composed new data of Bills with relevant buzzword info as well as whether or not they were enacted~~Implemented concurrent enclosing circle EnclosingCircleRemake2.py. Documented in enclosing circle page.

~~''3 - 4:45'' Worked on Accelerators data collection.~~===Fall 2016===

'''09/15/16''': Was introduced to the Wiki, built my page and was added to the RDP and Slack. Practiced basic Linux with Harsh and was introduced to the researchers.

'''109/1820/1716''': Was introduced to the DB server and how to access it/mount bulk drive in the RDP. 2:30-3 Tried (and failed) to help Will upload his file to his database. Learned from Harsh how to transfer Will's file between machines so that he could access it for his table (FileZilla/ Putty, but really we should've just put it in the RDP mounted bulk drive we built at the beginning.)

''~~10-12:45~~'09/22/16'' ~~Starting running old twitter programs~~ ": Labeled new supplies (USB ports). Looked online for a solution to labeling the black ports, sent link with potentially useful supplies to Dr. Dayton. Went through all of the new supplies plus monitors, desktops and mice) and ~~reviewing how they work~~created Excel sheet to keep track of them (Name, Quantity, SN, Link etc. ~~Automate~~).~~py is currently running and AutoFollower is in~~ Added my hours to the ~~process of being fixed~~wiki Work Hours page, updated my Work Log.

'''09/27/16''': Read through the wiki page for the existing twitter crawler/example. Worked on adjusting our feeds for HootSuite and making the content on it relevant to the people writing the tweets/blogs.

~~'''1/20/17'''~~[[Christy Warden (Social Media)]]

~~''10-11'' Worked on twitter programs. Added error handling for Automate.py~~ This is a link to all of the things I did to the HootSuite and ~~it appears~~ brainstorming about how to ~~be working but I will check on Monday~~up our twitter/social media/blog presence.

''~~11-11:15~~'09/29/16'' ~~Talked with Ed about projects that will be done this semester and what I~~'~~ll be working on.~~

~~''11:15 - 12'' Went through our code repository and made a second Wiki~~ Everything I did is inside of my social media research page ~~documenting the changes since it has last been completed.~~ http://mcnair.bakerinstitute.org/wiki/~~Software_Repository_Listing_2~~Christy_Warden_(Social_Media)I got the twitter crawler running and have created a plan for how to generate a list of potential followers/ people worth following to increase our twitter interactions and improve our feed to find stuff to retweet.

''~~12-12:45~~'10/4/16''' ~~Worked on the smallest enclosing circle problem for location of startups.~~

''11-12:30:'' Directed people to the ambassador event.

''~~'1/23/17'~~12:30-3:''work on my crawler (can be read about on my social media page)

''103-124:45:'' Worked on the enclosing circle problem. Wrote and completed a program which guarantees a perfect outcome but takes forever to run because it checks all possible outcomes. I would like to maybe rewrite it or improve it so that it outputs a good solution, but not necessarily a perfect one so that we can run the program on larger quantities of data. Also today I discussed the cohort data breakdown with Peter and checked through the donald trump twitter code. Automate.py seems to be working perfectly now, and I would like someone to go through the content with me so that I can filter it more effectively. Autofollower appears to be failing but not returning any sort of error code? I've run it a few different times and it always bottlenecks somewhere new, so I suspect some sort of data ~~limiting on twitter is preventing this algorithm from working. Need to think of a new one~~crawl.

'''10/6/16'''

''12:15-4:45:'1' Worked on the Twitter Crawler. It currently takes as input a name of a twitter user and returns the active twitter followers on their page most likely to engage with our content. I think my metric for what constitutes a potential follower needs adjusting and the code needs to be made cleaner and more helpful. Project is in Documents/25Projects/~~17'''~~Twitter Crawler in the RDP. More information and a link to the page about the current project is on my social media page [[Christy Warden (Social Media)]]

'''10~~-12:45~~/18/16''' Simultaneously worked twitter and enclosing circle because they both have a long run time. I realized there was an error in my enclosing circle code which I have corrected and tested on several practice examples. I have some idea for how to speed up the algorithm when we run it on a really large input, but I need more info about what the actual data will look like. Also, the program runs much more quickly now that I corrected the error.

~~For twitter, I discovered that~~ ''1-2:30:''updated the information we have for the Donald Trump tweets. The data is in the ~~issues I am having lies somewhere~~ Trump Tweets project in the ~~follow API so for now,~~ bulk folder and should have his tweets up until this afternoon when Istarted working. '~~ve commented it out~~ '2:30-5:''Continued (and ~~am running~~ completed a version of) the ~~program minus the follow component to assure that everything else is working~~twitter crawler. ~~So far,~~ I have ~~not seen any unusual behavior, but~~ run numerous example users through the crawler and checked the ~~program has a long wait period so it is taking a while~~ outputs to see if the people I return are users that would be relevant to ~~test~~@BakerMcNair and generally they are.[[Christy Warden (Social Media)]] for more information

''5 - 5:30:'' Started reading about the existing eventbrite crawler and am brainstorming ideas for how we could use it. (Maybe incorporate both twitter and eventbrite into one application?)

'''110/2725/1716'''

''1012:15-124:45:'' ~~So much twitter~~Worked on the Twitter Crawler. ~~Finally found~~ I am currently collecting data by following around 70-80 people while I am at work and measuring the success of the ~~bug~~ follow so that ~~has plagued the~~ I can adjust my program ~~(sleep_on_rate_limit should have been False). Program is now running on my dummy account, and I am going~~ to ~~check its progress~~ make optimal following decisions based on ~~monday YAY~~historical follow response.More info at [[Christy Warden (Social Media)]]

'''10/27/16'''

''~~'2/~~12:15-3~~/17'~~:''First I ran a program that unfollowed all of the non-responders from my last follow spree and then I updated by datas about who followed us back. I cannot seem to see a pattern yet in the probability of someone following us back based on the parameters I am keeping track of, but hopefully we will be able to see something with more data. Last week we had 151 followers, at the beginning of today we had 175 follows and by the time that I am leaving (4:45) we have 190 followers. I think the program is working, but I hope the rate of growth increases.

''3-4'' SQL Learning with Ed

~~# Patent Data (more~~ ''4-4:45'' Found a starter list of people~~) and VC Data (build dataset~~ to crawl for ~~paper classifier)~~ ~~# US Universities patenting~~ Tuesday, checked our stats and ~~entrepreneurship programs (help w code for identifying Universities~~ ran one more starting position through the crawler. Updated data sheets and ~~assigning to patents)~~ worklog. ~~# Matching tool in Perl~~ The log of who I've followed (~~fix, run??~~and if they've followed back) ~~# Collect details~~ are all on ~~Universities (look on wikipedia, download xml and process)# Maps issue~~the twitter crawler page.

~~(note - this was moved here by Ed from a page called "New Projects" that was deleted)~~

'''211/61/1716'''

~~Worked on~~ ''12:15 - 2:'' Unfollowed the ~~classification based on description algorithm~~ non responders, followed about 100 people using the ~~whole time I was here~~crawler. ~~I was able to break down the new~~ Updated my data ~~so that the key words are~~ sheets about how people have responded and added all ~~found and accounted for on a given set of data and so that I can go through a description and tag~~ the ~~words and output a matrix. Now I am trying~~ new followers to ~~develop a way to generate~~ the output I anticipate from the input matrix of tagged words. Tried MATLAB but I would have to buy a neural network package and I didn't realize until the end of the day. Now I am looking into writing my own neural network or finding a good python library to runlog on [[Christy Warden (Social Media)]] twitter crawler page.

~~http~~''2-4:~~//scikit-learn.org/stable/modules/svm~~45'' Prepped the next application of my twitter crawling abilities, which is going to be a constantly running program on a dummy account which follows a bunch of new sources and dms the McNair account when something related to us shows up.~~html#svm~~

~~going to try this on Wednesday~~

'''11/3/16'''

''12:15-12:30:'~~2/17/17~~'I made a mistake today! I intended to fix a bug that occurred in my DM program, but accidentally started running a program before copying the program's report about what went wrong so I could no longer access the error report. I am running the program again between now and Thursday and hoping to run into the same error so I can actually address it. (I believe it was something to do with a bad link). I did some research about catching and fixing exceptions in a program while still allowing it to continue, but I can't really fix the program until I have a good example of what is going wrong.

~~Comment section~~ ''12:30 - 2:30:'' Unfollowed the non responders, followed about 100 people using the crawler. Updated my data sheets about how people have responded and added all the new followers to the log on [[Christy Warden (Social Media)]] twitter crawler page. I've noticed that our ratios of ~~Industry Classifier wiki page~~successful returns of our follow are improving, I am unsure whether I am getting better at picking node accounts or whether our account is gaining legitimacy because our ratio is improving.

''2-4:15'' I had the idea after my DM program which runs constantly had (some) success, that I could make the follow crawler run constantly too? I started implementing a way to do this, but haven't had a chance to run or test it yet. This will present serious difficulties because I don't want to do anything that could potentially get us kicked off twitter/ lose my developer rights on our real account. It is hard to use a dummy acct for this purpose though, because nobody will follow back an empty account so it'll be hard to see if the program succeeds in that base case. I will contemplate tonight and work on it Thursday.

''4:15-4:30'2' Started adding comments and print statements and some level of organization in my code in case other/~~20/17'''~~future interns use it and I am not at work to explain how it functions. The code could definitely do with some cleanup, but I think that should probably come later after everything is functional and all of our twitter needs are met.

~~Worked~~ ''4:30-4:45'' Updated work log and put my thoughts on ~~building a data table of long descriptions rather than short ones and started using this as the input to industry classifier~~my social media project page.

'''211/228/1716'''

~~Finished code from above, ran numerous times with mild changes to data types (which takes forever) talked~~ ''12:15-1'' Talked to Ed about my project and ~~built an aggregation model~~worked out a plan for the future of the twitter crawler. I will explain all of it on the social media page.

''1- 4:45'' Worked on updating the crawler. It is going to take awhile but I made a lot of progress today and expect that it should be working (iffily) by next Thursday.

~~'''2/24/17'''~~

About to be done with industry classifier. Got 76% accuracy now, working on a file that can be used by non-comp sci people where you just type in the name of a file with a Company [tab] description format and it will output Company [tab] Industry. Worked on allowing this program to run without needing to rebuild the classification matrix every single time since I already know exactly what I'~~m training it on. Will be done today or Monday I anticipate.~~''11/10/16'''

''12:15 - 4:45'' Tried to fix bug in my retweeting crawler, but still haven't found it. I am going to keep running the program until the error comes up and then log into the RDP as soon as I notice and copy down the error. Worked on changes to the crawler which will allow for automation.

~~'''2/27/17'''~~

Classifier is done whooo! It runs much more quickly than anticipated due to the use of the python Pickle library (discovered by Peter) and I will document its use on the industry classifier page. (Done: ~~http:~~'''11/15/~~mcnair.bakerinstitute.org/wiki/Industry_Classifier).~~I also looked through changes to Enclosing Circle and realized a stupid mistake which I corrected and debugged and now a circle run that used to take ten minutes takes seven seconds. It is ready to run as soon as Peter is done collecting data, although I16'''~~d like to think of a better way to test to make sure that these really are the optimal circles.~~

''12:15 - 1:30'' Changing twitter crawler.

''~~'3/01/17'~~1:30 - 4:45''Worked on pulling all the data for the executive orders and bills with Peter (we built a script in anticipation of Harsh gathering the data from GovTrack which will build a tsv of the data)

Plotted some of the geocoded data with Peter and troubleshooted remaining bugs. Met with Ed and discussed errors in the geodata, which I need to go through and figure out how to fix. Worked on updating documentation of enclosing circles and related projects.

'''11/17/16'''

''~~'3/06/17'~~12:15 - 1:30''Changing twitter crawler

~~Worked on Enclosing Circle~~ ''1:30 - 5:30'' Fixed the script Peter and I wrote because the data Harsh gathered ended up being in a slightly different form than what we anticipated. Peter built and ~~started~~ debugged a crawler to pull all of the ~~geocoder which is running~~ executive orders and ~~should continue~~ I debugged the tsv output. I stayed late while the program ran on Harsh's data to ensure no bugs and discovered at the very very end of the run ~~through Wednesday~~that there was a minor bug. Fixed it and then left.

~~'''3/20/17'''~~

Tried to debug Enclosing Circle with Peter. Talked through a Brute force algorithm with Ed, wrote explanation of Enclosing circle on Enclosing Circle wiki page and also wrote an English language explanation of a brute force algorithm.'''11/22/16'''

''12:15- 2'' Worked on updating the crawler so that it runs automatically. Ran into some issues because we changed from python 2.7 to anaconda, but got those running again. Started the retweeter crawler, seems to be working well.

''~~'3/27/17'~~2-2:30''Redid the Bill.txt data for the adjusted regexes. Met with Harsh, Ed and Peter about being better at communicating our projects and code.

~~More debugging~~ ''2:30-4:30'' Back to the twitter crawler. I am now officially testing it before we use it on our main account and have found some bugs with ~~Peter~~data collection that have been adjusted. ~~Wrote~~ I realized at the very end of the day that I have a logical flaw in my code that needs to ~~remove subsumed circles~~ be adjusted because only 1 person at a time goes into the people we followed list. Basically, because of this, we will only be following one person in every 24 hour period. When I get back from Thanksgiving, I need to change the unfollow someone function. The new idea is that I will follow everyone that comes out of a source node, and ~~tested~~ then call the unfollow function for as long as it~~. Discovered~~ will run for while maintaining the condition that ~~we were including many duplicate points which~~ the top person on the list was ~~throwing off~~ followed for more than one day. I will likely need only one more day to finish this program before it can start running on our ~~results~~ account.

''~~'3/29/17'~~4:30 - 4:45''In response to the "start communicating with the comp people" talk, I updated my wiki pages and work log on which I have been heavily slacking.

~~Tried to set up an IDE for rewriting enclosing circle in C.~~

'''11/29/16'''

''~~'3/31/17'~~12:15- 1:45''Fixed code and reran it for gov track project, documented on E&I governance

~~Finally got the IDE set up after many youtube tutorials and sacrifices~~ ''1:45- 2'' Had accelerator project explained to the computer gods. It is a 30 day trial so I need to check with Ed about if a student license is a thing we can use or not for after that. Spent time familiarizing myself with the IDE and writing some toy programs. Tried to start writing my circle algorithm in C and realized that this is an overwhelming endeavor because I used many data structures that are not supported by C at all. I think that I could eventually get it working if given a ton of time but the odds are slim on it happening in the near future. Because of this, I started reading about some programs that take in python code and optimize parts of it using C which might be helpful (Psyco is the one I was looking at). Will talk to Ed and Peter on Monday.me

''2 - 2:30'' Built histograms of govtrack data with Ed and Albert, reran data for Albert.

''~~'04/03/17'~~2:30-4:45''Completed first 5 reports (40-45) on accelerators (accidentally did number 20 as well)

~~[[Matching Entrepreneurs to VCs]]~~

'''0412/101/1716'''

~~Same~~ ''12:15- 3'' Fixed the perl code that gets a list of all Bills that have been passed, then composed new data of Bills with relevant buzzword info as ~~above~~well as whether or not they were enacted.

''3 - 4:45'' Worked on Accelerators data collection.

'''~~-4/12/17~~Notes from Ed'''

~~Same as above~~I moved all of the Congress files from your documents directory to: E:\McNair\Projects\E&I Governance Policy Report\ChristyW

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,613

edits

Changes

Christy Warden (Work Log) (view source)

Revision as of 19:53, 29 September 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools