Difference between revisions of "Twitter News Finder (Tool)"
Line 5: | Line 5: | ||
|Has project status=Active | |Has project status=Active | ||
|Keywords=Webcrawler, Database, Twitter, API, Python | |Keywords=Webcrawler, Database, Twitter, API, Python | ||
+ | |Has notes= | ||
+ | |Has project status= | ||
+ | |Is dependent on= | ||
+ | |Depends upon it= | ||
}} | }} | ||
== '''Finding Sources of News Crawl''' == | == '''Finding Sources of News Crawl''' == |
Revision as of 20:17, 28 February 2017
Twitter News Finder (Tool) | |
---|---|
Project Information | |
Project Title | Twitter Follower Finder (Tool) |
Owner | Christy Warden |
Start Date | Fall 2016 |
Deadline | |
Primary Billing | |
Notes | |
Has project status | |
Copyright © 2016 edegan.com. All Rights Reserved. |
Finding Sources of News Crawl
Description
This is a crawler that will operate on a dummy account (@testapplicat2) that is following popular news sources. Every fifteen minutes, the application does an analysis of its timeline, and sends any tweets that contain buzzwords (matched by a regex) to the BakerMcNair account in a direct message with an @ to the source.
Development
Same as Christy Warden (Twitter Crawler Application 1)
Test Plan
Let it run for this week (sending DMs to my personal account) and see if the crawler can function infinitely (will twitter shut us down?). Also, checking to make sure that the tweets that the regex catches are interesting and worthy of making someone look at.
Log:
11/1
Built application. Having issues with rate limiting (sigh) but plan on letting the crawler sleep for 15 minutes every time it does anything, which might negate the issue. This probably won't matter, because no more than 200 tweets should likely be posted within 15 minutes on the timeline anyway.
11/22
The application is built and runs! I have changed the method of info transmission to be via an account which retweets all of the good tweets and articles. The plan is perhaps to mute all of the other accounts that BakerMcNair follows so that the only things on our timeline are the tweets that my crawler finds.
Another thing worth commenting on is that the application has failed to run continuously between any two sessions that I am at work. There are many reasons that I suspect this happens and I'm not sure which one is the case:
1) I've only tried to run it continuously a few times and each time, something weird has happened including but not limited to the fact that in the times between when I worked, the RDP was shut down, python was uninstalled/ reinstalled etc. It could be that the failures keep resulting from unlucky coincidences.
2) Something is actually wrong with the tweets (maybe they are deleted before we go try to retweet them). I would need to program in an exception catching mechanism but I haven't found an example case yet of something that shut down the program.
THE PLAN:
I am having the crawler send a DM to my personal account every 15 minutes to assure me that it's still working. If I notice that I stop getting messages, I'll log in to the RDP ASAP and copy the error message that was sent before anyone can do anything whack with the RDP or Python again.
ALSO: I forgot to mention that I took all of the twitter handles from here: http://mcnair.bakerinstitute.org/wiki/Social_Media_Entrepreneurship_Resources and here http://mcnair.bakerinstitute.org/wiki/Political_news_twitter_handles and followed them on the account I am using for this project. Those are all the news sources we are using now, but I could happily add to them if need be.