Twitter News Finder (Tool)
Twitter News Finder (Tool) | |
---|---|
Project Information | |
Has title | Twitter News Finder (Tool) |
Has owner | Christy Warden |
Has start date | Fall 2016 |
Has deadline date | |
Has keywords | Webcrawler, Database, Twitter, API, Python, Tool |
Has project status | |
Has sponsor | McNair Center |
Copyright © 2019 edegan.com. All Rights Reserved. |
Finding Sources of News Crawl
Description
This is a crawler that will operate on a dummy account (@testapplicat2) that is following popular news sources. Every fifteen minutes, the application does an analysis of its timeline, and sends any tweets that contain buzzwords (matched by a regex) to the BakerMcNair account in a direct message with an @ to the source.
Development
Same as Christy Warden (Twitter Crawler Application 1)
Test Plan
Let it run for this week (sending DMs to my personal account) and see if the crawler can function infinitely (will twitter shut us down?). Also, checking to make sure that the tweets that the regex catches are interesting and worthy of making someone look at.
Log:
11/1
Built application. Having issues with rate limiting (sigh) but plan on letting the crawler sleep for 15 minutes every time it does anything, which might negate the issue. This probably won't matter, because no more than 200 tweets should likely be posted within 15 minutes on the timeline anyway.
11/22
The application is built and runs! I have changed the method of info transmission to be via an account which retweets all of the good tweets and articles. The plan is perhaps to mute all of the other accounts that BakerMcNair follows so that the only things on our timeline are the tweets that my crawler finds.
Another thing worth commenting on is that the application has failed to run continuously between any two sessions that I am at work. There are many reasons that I suspect this happens and I'm not sure which one is the case:
1) I've only tried to run it continuously a few times and each time, something weird has happened including but not limited to the fact that in the times between when I worked, the RDP was shut down, python was uninstalled/ reinstalled etc. It could be that the failures keep resulting from unlucky coincidences.
2) Something is actually wrong with the tweets (maybe they are deleted before we go try to retweet them). I would need to program in an exception catching mechanism but I haven't found an example case yet of something that shut down the program.
THE PLAN:
I am having the crawler send a DM to my personal account every 15 minutes to assure me that it's still working. If I notice that I stop getting messages, I'll log in to the RDP ASAP and copy the error message that was sent before anyone can do anything whack with the RDP or Python again.
ALSO: I forgot to mention that I took all of the twitter handles from here: http://mcnair.bakerinstitute.org/wiki/Social_Media_Entrepreneurship_Resources and here http://mcnair.bakerinstitute.org/wiki/Political_news_twitter_handles and followed them on the account I am using for this project. Those are all the news sources we are using now, but I could happily add to them if need be.