Christy Warden (Social Media)

Jump to navigation Jump to search


Christy Warden (Twitter Crawler Application 1)

Christy Warden (Twitter Crawler Application 2)


Talked with Ramee about what kind of content the twitter account seeks to retweet/take links from. Issues she has with HootSuite: - Content in the feed is not relevant, often from illegitimate sources (random people's tweets that happen to contain the word entrepreneurship)

Goals for HOOTSUITE: - improve filters to grab tweets with legitimate content - innovation/research good from most fields, specifically life sciences/ health - preferred from around Houston/ San Francisco/ Boston

THINGS I DID for HOOTSUITE: - add the filter:link to the HootSuite feeds to only include tweets which link to external sources (hopefully increases legitimate tweets) - add geolocation (Houston) to the innovation feed to decrease scope of search - added "patent" search to entrepreneurship field and "research" to innovation - required both feeds to filter for tweets containing links

ANTICIPATING IMPORTANT TWEETS/ BLOGPOSTS BRAINSTORMING: - example given by Dr. Egan: We could have created a blogpost linking all of the Channels that the debates would be shown on - looking for a calendar to anticipate ?? - Potentially have people write blogposts with searchable terms/tags just before events ("10 Nobel Prize Innovators Who blah blah blah" right before the announcement of Nobel Prizes this October)

FINDING PEOPLE WHO FOLLOW PEOPLE LIKE US: I am reading about this guy's crawler which appears to do this. I will continue looking at it on Thursday.



Existing Crawlers

Spent a significant amount of time with Harsh tying to figure out how to get the existing twitter crawler to work and download its output file to a place we can access.

HERE IS THE PLAN Making plans for how to use twitter crawler to find relevant people to follow: I am changing the twitter crawler so that it will do this:

WE PLUG IN: The twitter handle of a person we think posts content similar to ours or whose followers are likely to overlap with people interested in us What the crawler will do: Crawl their tweets and make a count for each tweet for how many entrepreneur buzzwords we find. Take the top scoring tweet and crawl the followers of that tweet. Rank the RTs by selection criteria which I haven't totally decided yet but might include: - How many of their tweets contain buzzwords - their follower/following ration - how active they are

OUTPUT: a list of twitter handles of people who are similar to us/ like our kind of content/ are likely to follow back and interact with us.


I spent the majority of today building a function which takes as an input a username and returns a list of people who use our buzzwords and who we should potentially follow. The function is almost done and I estimate I can finish it by Thursday.

For about an hour and a half, I compiled a datasheet of trump's twitter activity since his nomination. I emailed this file to Ed and Anne.


Christy Warden (Twitter Crawler Application 1)


The first crawler is complete! Returns an excel file of ranked retweeters of relevant tweets. I tested this on a bunch of users and am getting results that I think are good. The next step is to talk to someone about what exactly we want to do with this information. One issue is when someone with tweets that are good have no retweeters. Makes it difficult to get any information out of their page. Again, this is located in my RDP page at Documents/My Projects/Twitter Crawler/


Today I created a plan to experiment with the crawler which is explained on Christy Warden (Twitter Crawler Application 1) I researched and followed accounts recommended by the crawler and plan to check back on Tuesday to see if they follow back. After I do this for a few times, I will be able to see how to adjust my criteria for choosing someone to follow and plan on automating the system. The end goal would be for me to run a large program every time I come to work, which unfollows people we followed who didn't follow back, follows new people based on the algorithms that I am testing. I am spending time today figuring out how to automate the follow/unfollow process so that this can be achieved quickly once I get some results from this initial follow spree.


Today I came back to discover that only 2 people in approximately 30 followed us back after my last week follow spree. I wrote a program which unfollows all the people who didn't follow us back so that we don't wrack up huge numbers of following. I am considering that maybe it would be better to target people who don't have a high number of followers or whose follower/following ratio is very low. I incorporated these components into my algorithm, but I am not certain that I have discovered the optimal balance for the total score of the potential follower. I used a tactic which incorporated this score concept this week. Additionally, I automated this process so that people who achieve a threshold score are automatically followed by the program, which significantly improved my efficiency in following. Because of this, I was able to follow around 70 people this week which should provide us with more data for Thursday when I check the result of this experiment. I plan on asking some of the Stat/ math interns for help with calculating the significance of the scores so that I can figure out which score make-up best correlates with the probability of a follow back.

Something interesting that I noticed was that when I was following large numbers of people today, we gained about 4 followers. I assume those accounts are also operating on a crawler of some kind and noticed us following mutual accounts? I wonder if we could explore this as a strategy in and of itself, gaining followers by following huge batches of people that are tracked by other crawlers? I am not sure this would be efficient, however, because I am assuming those crawlers also unfollow people who fail to respond to them which defeats the purpose.

Another thought, the only way that we will ever break out of the "follow someone and hope for a follow" 1:1 ratio is if the followers that we are gaining are people who will retweet us and interact with our content. That way they will garner attention for us in their own audience of followers and we will gain unrequited followers. So even though we only gained 2 followers out of the 30 I followed last time, they actually seemed like quality accounts and one of them even retweeted us. I think the long process of seeking accounts carefully rather than just following mass numbers of people will ultimately build an active follower base. Thus, we won't become one of those accounts with like 73k followers that get 1 favorite and 0 retweets on the vast majority of their content.

We need to start considering automated interactions with the accounts that follow us or that we follow (like maybe auto-favoriting a few of their tweets or having someone draft a DM for the people we are trying to win over?). I am definitely not the best person to come up with a framework for this, however so I would need to talk to Ramee/ Anne/ any of the social sciences interns about some possible approaches.


First I ran a program that unfollowed all of the non-responders from my last follow spree and then I updated by datas about who followed us back. I cannot seem to see a pattern yet in the probability of someone following us back based on the parameters I am keeping track of, but hopefully we will be able to see something with more data. Last week we had 151 followers, at the beginning of today we had 175 follows and by the time that I am leaving (4:45) we have 190 followers. I think the program is working, but I hope the rate of growth increases.


Did the usual, followed around 100 people after purging our followed list from my previous session. I estimate that from here onward, this will take me approximately 1.5 hours to complete during every day that I work. Now, I am trying to strategize a way to keep our account informed when major news sources release anything related to entrepreneurship or innovation which is what I was asked to do next. My current idea is to have a dummy account (I can use the account I build for the test application) in order to follow all of those sources. I think it should run constantly, and when the timeline has a new item, the program will look at the content of the tweet and check for buzzwords. If the tweet does have buzzwords, it should direct message the BakerMcNair account a link to the tweet so that it can be considered.


Followed around 100 people. Purged the failures. I read my previous log where I said this should take around 1.5 hours for me to complete from here on and thought to myself "Why should it take any time??" I now want to change my program so that I constantly does what it does when I manually input usernames into it, without anything coming from me. Here is the idea: At the very beginning of time, we will input one user into the Super Automatic Follower Program. It will crawl through it just how my old program did, following the good people BUT when it follows someone new, it will have to unfollow someone who didn't follow us back from a few hours ago as well (lots of checks in place). That way, we don't become one of those accounts that is following like THOUSANDS of people and is very obnoxious. If there isn't anyone from a few hours back, the program will sleep for 15 minutes and try again. ADDITIONALLY: while we are looking for people to follow, we will also check for new starting nodes. My criteria for a start node is someone who has 20 times the number of followers that they are following and uses at least five buzzwords in 20 tweets. This approximates what I look for when I am finding new start nodes manually. It will add this person to a list of people to crawl and get to them after we finish the follow/unfollow nonsense for the current start node. There is such a mind bogglingly huge number of things that could go wrong with this (probably why people don't usually build programs like this on their own and just buy them instead) that I will have to handle (RATE LIMITING FOR ONE -_- and also less annoying but equally problematic things like what do we do if we can't find a new start node? and the big fear that twitter will catch onto our nonsense and lock us out of our developer rights....) but I'm fairly certain it can be done, especially if the DM program I've got going on can work as well.

On the topic of the DM program, there is some weird error that occasionally happens where a link is bad, so the program refuses to DM us and shoots an error and then everything stops which is obviously not cool. I plan on fixing this (catch and try) and it shouldn't be a major hassle, but the issue is that I need an example of a link that is going wrong so I can build a test to check if it's fixed. I accidentally passed over the example from last week today and lost it, so now I have to let the program run until Thursday and hope that the error happens again.