Changes

6,469 bytes added , 19:53, 29 September 2020

no edit summary

~~'''09/15/16''':~~ ===Fall 2017===<onlyinclude>[[Christy Warden]] [[Work Logs]] [[Christy Warden (Work Log)|(log page)]]

~~''2~~2017-~~4:45''~~12-12: ~~Was introduced to the Wiki, built my page and was added to the RDP and Slack. Practiced basic Linux with Harsh and was introduced to the researchers.~~[[Scholar Crawler Main Program]] [[Accelerator Website Images]]

~~'''09/20/16'''~~ 2017-11-28: [[PTLR Webcrawler]] [[Internal Link Parser]]

~~''2~~2017-~~2:30:'' Was introduced to the DB server and how to access it/mount bulk drive in the RDP. 2~~11-21:~~30-3 Tried (and failed) to help Will upload his file to his database.~~ [[PTLR Webcrawler]]

~~''3~~2017-~~4:45~~09-21:'' Learned from Harsh how to transfer Will's file between machines so that he could access it for his table (FileZilla/ Putty, but really we should've just put it in the RDP mounted bulk drive we built at the beginning.)[[PTLR Webcrawler]]

~~'''~~2017-09~~/22/16'''~~-14: Ran into some problems with the scholar crawler. Cannot download pdfs easily since a lot of the links are not to PDFs they are to paid websites. Trying to adjust crawler to pick up as many pdfs as it can without having to do anything manually. Adjusted code so that it outputs tab delimited text rather than CSV and practiced on several articles.

~~''2~~2017-09-~~2:30~~12:Got started on Google Scholar Crawling. Found Harsh'~~' Labeled new supplies (USB ports)~~s code from last year and figured out how to run it on scholar queries. ~~Looked online for~~ Adjusted provided code to save the results of the query in a ~~solution to labeling~~ tab-delimited text file named after the query itself so that it can be found again in the ~~black ports, sent link with potentially useful supplies to Dr. Dayton~~future.

~~''2:30~~2017-09-311:~~''Went through all of~~ Barely started [[Ideas for CS Mentorship]] before getting introduced to my new project for the ~~new supplies plus monitors~~semester. Began by finding old code for pdf ripping, ~~desktops~~ implementing it and ~~mice) and created Excel sheet to keep track of them (Name, Quantity, SN, Link etc~~trying it out on a file.)

~~''3~~2017-309-07:~~15:'' Added~~ Reoriented myself with the Wiki and my previous projects. Met new team members. Began tracking down my former Wikis (they all seem pretty clear to me thus far about where to get my code for everything). Looking through my ~~hours~~ C drive to figure out where the ~~wiki Work Hours page, updated~~ pieces of code I have in my ~~Work Log~~personal directory belong in the real world (luckily I am a third degree offender only).</onlyinclude>

~~'''09/27/16'''~~ ===Spring 2017===

'''1/18/17''' ''10-12:45'' Starting running old twitter programs and reviewing how they work. Automate.py is currently running and AutoFollower is in the process of being fixed. '''1/20/17''' ''10-11'' Worked on twitter programs. Added error handling for Automate.py and it appears to be working but I will check on Monday. ''11-11:15'' Talked with Ed about projects that will be done this semester and what I'll be working on. ''11:15 - 12'' Went through our code repository and made a second Wiki page documenting the changes since it has last been completed. http://mcnair.bakerinstitute.org/wiki/Software_Repository_Listing_2 ''12-12:45'' Worked on the smallest enclosing circle problem for location of startups. '''1/23/17''' ''10-12:45'' Worked on the enclosing circle problem. Wrote and completed a program which guarantees a perfect outcome but takes forever to run because it checks all possible outcomes. I would like to maybe rewrite it or improve it so that it outputs a good solution, but not necessarily a perfect one so that we can run the program on larger quantities of data. Also today I discussed the cohort data breakdown with Peter and checked through the twitter code. Automate.py seems to be working perfectly now, and I would like someone to go through the content with me so that I can filter it more effectively. Autofollower appears to be failing but not returning any sort of error code? I've run it a few different times and it always bottlenecks somewhere new, so I suspect some sort of data limiting on twitter is preventing this algorithm from working. Need to think of a new one. '''1/25/17''' ''10-12:45'' Simultaneously worked twitter and enclosing circle because they both have a long run time. I realized there was an error in my enclosing circle code which I have corrected and tested on several practice examples. I have some idea for how to speed up the algorithm when we run it on a really large input, but I need more info about what the actual data will look like. Also, the program runs much more quickly now that I corrected the error. For twitter, I discovered that the issues I am having lies somewhere in the follow API so for now, I've commented it out and am running the program minus the follow component to assure that everything else is working. So far, I have not seen any unusual behavior, but the program has a long wait period so it is taking a while to test. '''1/27/17''' ''10-12:45'' So much twitter. Finally found the bug that has plagued the program (sleep_on_rate_limit should have been False). Program is now running on my dummy account, and I am going to check its progress on monday YAY. '''2/3/17''' # Patent Data (more people) and VC Data (build dataset for paper classifier) # US Universities patenting and entrepreneurship programs (help w code for identifying Universities and assigning to patents) # Matching tool in Perl (fix, run??) # Collect details on Universities (look on wikipedia, download xml and process)# Maps issue (note -this was moved here by Ed from a page called "New Projects" that was deleted) '''2/6/17''' Worked on the classification based on description algorithm the whole time I was here. I was able to break down the new data so that the key words are all found and accounted for on a given set of data and so that I can go through a description and tag the words and output a matrix. Now I am trying to develop a way to generate the output I anticipate from the input matrix of tagged words. Tried MATLAB but I would have to buy a neural network package and I didn't realize until the end of the day. Now I am looking into writing my own neural network or finding a good python library to run. http://scikit-learn.org/stable/modules/svm.html#svm going to try this on Wednesday '''2/17/17''' Comment section of Industry Classifier wiki page. '''2/20/17''' Worked on building a data table of long descriptions rather than short ones and started using this as the input to industry classifier. '''2/22/17''' Finished code from above, ran numerous times with mild changes to data types (which takes forever) talked to Ed and built an aggregation model. '''2/24/17''' About to be done with industry classifier. Got 76% accuracy now, working on a file that can be used by non-comp sci people where you just type in the name of a file with a Company [tab] description format and it will output Company [tab] Industry. Worked on allowing this program to run without needing to rebuild the classification matrix every single time since I already know exactly what I'm training it on. Will be done today or Monday I anticipate. '''2/27/17''' Classifier is done whooo! It runs much more quickly than anticipated due to the use of the python Pickle library (discovered by Peter) and I will document its use on the industry classifier page. (Done:25http://mcnair.bakerinstitute.org/wiki/Industry_Classifier).I also looked through changes to Enclosing Circle and realized a stupid mistake which I corrected and debugged and now a circle run that used to take ten minutes takes seven seconds. It is ready to run as soon as Peter is done collecting data, although I'd like to think of a better way to test to make sure that these really are the optimal circles. ' ~~Read~~ ''3/01/17''' Plotted some of the geocoded data with Peter and troubleshooted remaining bugs. Met with Ed and discussed errors in the geodata, which I need to go through and figure out how to fix. Worked on updating documentation of enclosing circles and related projects. '''3/06/17''' Worked on Enclosing Circle data and started the geocoder which is running and should continue to run through Wednesday. '''3/20/17''' Tried to debug Enclosing Circle with Peter. Talked through a Brute force algorithm with Ed, wrote explanation of Enclosing circle on Enclosing Circle wiki page and also wrote an English language explanation of a brute force algorithm. '''3/27/17''' More debugging with Peter. Wrote code to remove subsumed circles and tested it. Discovered that we were including many duplicate points which was throwing off our results . '''3/29/17''' Tried to set up an IDE for rewriting enclosing circle in C. '''3/31/17''' Finally got the IDE set up after many youtube tutorials and sacrifices to the computer gods. It is a 30 day trial so I need to check with Ed about if a student license is a thing we can use or not for after that. Spent time familiarizing myself with the IDE and writing some toy programs. Tried to start writing my circle algorithm in C and realized that this is an overwhelming endeavor because I used many data structures that are not supported by C at all. I think that I could eventually get it working if given a ton of time but the odds are slim on it happening in the near future. Because of this, I started reading about some programs that take in python code and optimize parts of it using C which might be helpful (Psyco is the ~~existing twitter crawler~~one I was looking at). Will talk to Ed and Peter on Monday. '''04/03/17''' [[Matching Entrepreneurs to VCs]] '''04/10/17''' Same as above '''-4/12/17''' Same as above '''04/17/17''' Same as above + back to Enclosing circle algorithm. I am trying to make it so that the next point chosen for any given circle is the point closest to its center, not to the original point that we cast the circle from. I am running into some issues with debugging that I will be able to solve soon. '''04/26/~~example~~17''' Debugged new enclosing circle algorithm. I think that it works but I will be testing and plotting with it tomorrow. Took notes in the enclosing circle page. ''~~Rest~~ '04/27/17''' PROBLEM! In fixing the enclosing circle algorithm, I discovered a problem in one of the ways Peter and I had sped up the program, which lead the algorithm to the wrong computations and completely false runtime. The new algorithm runs for an extremely long timeand does not seem feasible to use for our previous application. I am looking into ways to speed it up, but it does not look good. '''04/28/17''' Posted thoughts and updates on the enclosing circle page. '''05/01/17''' Implemented concurrent enclosing circle EnclosingCircleRemake2.py. Documented in enclosing circle page. ===Fall 2016=== '''09/15/16''':Was introduced to the Wiki, built my page and was added to the RDP and Slack. Practiced basic Linux with Harsh and was introduced to the researchers. '''09/20/16''' : Was introduced to the DB server and how to access it/mount bulk drive in the RDP. 2:30-3 Tried (and failed) to help Will upload his file to his database. Learned from Harsh how to transfer Will's file between machines so that he could access it for his table (FileZilla/ Putty, but really we should've just put it in the RDP mounted bulk drive we built at the beginning.) '''09/22/16''": Labeled new supplies (USB ports). Looked online for a solution to labeling the black ports, sent link with potentially useful supplies to Dr. Dayton. Went through all of the new supplies plus monitors, desktops and mice) and created Excel sheet to keep track of them (Name, Quantity, SN, Link etc.). Added my hours to the wiki Work Hours page, updated my Work Log. '''09/27/16''': Read through the wiki page for the existing twitter crawler/example. Worked on adjusting our feeds for HootSuite and making the content on it relevant to the people writing the tweets/blogs. [[Christy Warden (Social Media)]]~~[[Category:McNair Staff]]~~

This is a link to all of the things I did to the HootSuite and brainstorming about how to up our twitter/social media/blog presence.

''3 - 4:45'' Worked on Accelerators data collection.

'''Notes from Ed'''

~~'''1/18/17'''~~ ~~''10-12:45'' Starting running old twitter programs and reviewing how they work. Automate.py is currently running and AutoFollower is in the process of being fixed.~~ ~~'''1/20/17'''~~ ~~''10-11'' Worked on twitter programs. Added error handling for Automate.py and it appears to be working but I will check on Monday.~~ ~~''11-11:15'' Talked with Ed about projects that will be done this semester and what~~ I~~'ll be working on.~~ ''11:15 - 12'' Went through our code repository and made a second Wiki page documenting the changes since it has last been completed. http://mcnair.bakerinstitute.org/wiki/Software_Repository_Listing_2 ~~''12-12:45'' Worked on the smallest enclosing circle problem for location of startups.~~ ~~'''1/23/17'''~~ ~~''10-12:45'' Worked on the enclosing circle problem. Wrote and completed a program which guarantees a perfect outcome but takes forever to run because it checks~~ moved all ~~possible outcomes. I would like to maybe rewrite it or improve it so that it outputs a good solution, but not necessarily a perfect one so that we can run the program on larger quantities~~ of ~~data. Also today I discussed the cohort data breakdown with Peter and checked through the twitter code. Automate.py seems to be working perfectly now, and I would like someone to go through~~ the content with me so that I can filter it more effectively. Autofollower appears to be failing but not returning any sort of error code? I've run it a few different times and it always bottlenecks somewhere new, so I suspect some sort of data limiting on twitter is preventing this algorithm Congress files from ~~working. Need~~ your documents directory to ~~think of a new one.~~ ~~'''1/25/17'''~~ ~~''10-12~~:45'' Simultaneously worked twitter and enclosing circle because they both have a long run time. I realized there was an error in my enclosing circle code which I have corrected and tested on several practice examples. I have some idea for how to speed up the algorithm when we run it on a really large input, but I need more info about what the actual data will look like. Also, the program runs much more quickly now that I corrected the error. For twitter, I discovered that the issues I am having lies somewhere in the follow API so for now, I've commented it out and am running the program minus the follow component to assure that everything else is working. So far, I have not seen any unusual behavior, but the program has a long wait period so it is taking a while to test. ~~'''1/27/17'''~~ ~~''10-12~~ E:45'' So much twitter. Finally found the bug that has plagued the program (sleep_on_rate_limit should have been False). Program is now running on my dummy account, and I am going to check its progress on monday YAY. ~~'''2/3/17'''~~ ~~[[New~~ \McNair\Projects]] ~~'''2/6/17'''~~ Worked on the classification based on description algorithm the whole time I was here. I was able to break down the new data so that the key words are all found and accounted for on a given set of data and so that I can go through a description and tag the words and output a matrix. Now I am trying to develop a way to generate the output \E&I anticipate from the input matrix of tagged words. Tried MATLAB but I would have to buy a neural network package and I didn't realize until the end of the day. Now I am looking into writing my own neural network or finding a good python library to run. ~~http://scikit-learn.org/stable/modules/svm.html#svm~~ ~~going to try this on Wednesday~~Governance Policy Report\ChristyW

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,612

edits

Changes

Christy Warden (Work Log) (view source)

Revision as of 19:53, 29 September 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools