Changes

Twitter Webcrawler (Tool) (view source)

Revision as of 17:30, 19 July 2016

2,143 bytes added , 17:30, 19 July 2016

no edit summary

**doesn't need to have a read.csv side function - no room for failure, no need to test

**Make ***one query*** per iteration, please.

===7/19: Application on Todd's Hub Project Pt.II===

*As documented on <code>twitter-python</code> documentation, there is no direct way to filter timeline query results by start date/end date. So I've decided to write a support module <code>time_signature_processor</code> to help with counting the number of tweets that have elapsed since a month ago

**first-take with <code>from datetime import datetime</code>

**usage of datetime.datetime.stptime() method to parse formatted (luckily) date strings provided by <code>twitter.Status</code> objects into smart datetime.datetime objects to support mathematical comparisons (i.e. <code>if tweet_time_obj < one_month_ago_obj: </code>

**Does not support timezone-aware counting. current python version (2.7) does not support timezone-awareness in my datetime.datetime objects.

***'''functionality to be subsequently improved'''

*To retrieve data regarding # of following for each shortname, it seems like I have to call <code>twitter.api.GetUser()</code> in addition to <code>twitter.api.GetTimeline</code>. To ration token usage, I will omit this second call for now.

**'''functionality to be subsequently improved'''

*Improvements to debugging interface and practice

**Do note Komodo IDE's <code>Unexpected Indent</code> error message that procs when it cannot distinguish between whitespaces created by /tab or /space. Use editor debugger instead of interactive shell in this case. Latter is tedious and impossible to fix.

*data structure <code>pandas.DataFrame</code> can be built in a smart fashion by putting together various dictionaries that uses list-indices and list-values as key-value pairs in the df proper. More efficient than past method of creating empty table then populating it cell-by-cell.

raw_data = {'first_name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],

'last_name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze'],

'age': [42, 52, 36, 24, 73],

'preTestScore': [4, 24, 31, 2, 3],

'postTestScore': [25, 94, 57, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'preTestScore', 'postTestScore'])

df

GunnyLiu

Bureaucrats, Administrators (Semantic MediaWiki), Administrators

2,798

edits

Changes

Twitter Webcrawler (Tool) (view source)

Revision as of 17:30, 19 July 2016

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools