Changes

Jump to navigation Jump to search
2,859 bytes removed ,  17:52, 15 July 2016
no edit summary
===7/11: Project start===
----
*Dan wanted:
[[File:Capture 15.PNG|400px|none]]
***One can obtain the consumer key, consumer secret, access key and access secret through accessing the dev portal using the account and tapping <code>TOOLS > Manage Your Apps</code> in the footer bar of the portal.
**There is '''no''' direct access to Twitter database through http://, as before, so expect to do all processing in a py dev environment.
 
===7/12: Grasping API===
access_token_key='access_token',
access_token_secret='access_token_secret')
 
**Some potentially very useful query methods are:
***<code>Api.GetUserTimeline(user_id=None, screen_name=None)</code> which returns up to 200 recent tweets of input user. Really nice that twitter database operates on something as simple as <code>screen_name</code>, which is @shortname that is v public and familiar.
***<code>Api.GetFollowers(user_id=None, screen_name=None)</code> and <code>Api.GetFollowerIDs(user_id=None, screen_name=None)</code> which seems to be a good relationship mapping mechanism for esp. the mothernodes tweeters we care about.
===7**After retrieving data objects using these query methods, we can understand and process them using instructions from [http://5: Eventbrite API First-Take===--python-twitter.readthedocs.io/en/latest/_modules/twitter/models.html Twitter-python Models Source Code]*Eventbrite developer account for McNair Center: **first name: '''Anne''', last name: '''Dayton'''To note that tweets are expressed as <code>Status</code> objects**Login Email: '''admin@mcnaircenter.org''' **Login Password: It holds useful parameters such as <code>'text'</code>, <code>'amountcreated_at'''*Eventbrite API is well-documented and its database readily accessible. In the python dev environment</code>, I am using the http <code>requests'user'</code> library to make queries to the database, to obtain json data containing event objects that in turn contain organizer objects, venue objects, start/end time values, longitude/latitude values specific to each event. The etc****They can be retrieved by classical object expressions such as <code>requestsStatus.created_at</code> library has inbuilt ***To note that users are expressed as <code>.json()User</code> access methods, simplifying the json reading/writing process. Bang.objects**In querying for events organized by techstar, one of the biggest startup programs organization in the U*Best part? All these objects inherit .S., I use the following. Note Api methods such as AsJsonString(self) and AsDict(self) so that we can read and write them as JSON or DICT objects in the organizer ID of techstar is 2300226659.py environment import requests response = requests.get( "https==7/13://www.eventbriteapi.com/v3/organizers/2300226659/events/", headers Full Dev=== { "Authorization"'''Documented in-file, as below: "Bearer CRAQ5MAXEGHKEXSUSWXN",''' }, verify = True,===Twitter Webcrawler==== *Summary: Rudimentary (and slightly generalized)**In querying forwebcrawler that queries twitter database with using twitter API. At current stage of development/discussion, insteaduser shortname (in twitter, keywords such @shortname) is used as "startup weekendthe query key, and this script publishes 200 recent tweets of said user in a tab delimited, UTF-8 document," I use along with the following.details and social interactions each tweet possesses import requests response = requests.get*Input: Twitter database, Shortname string of queried user (@shortname) *Output: Local database of queried user's 200 recent tweets, described by the keys "https://www.eventbriteapi.com/v3/events/search/q=Content"startup weekend, "User", headers = { "AuthorizationCreated at": , "Bearer CRAQ5MAXEGHKEXSUSWXNHashtags", }, verify = True"User Mentions", )**In querying for events parked under the category "science and technologyRetweet Count", I use the following. However, this query also returns scientific seminars unrelated to entrepreneurship and is yet to be refined. **Note that the category ID of science and technology is 102. import requests response = requests.get( "https://www.eventbriteapi.com/v3/categories/102Retweeted By", headers = { "AuthorizationFavorite Count": , "Bearer CRAQ5MAXEGHKEXSUSWXNFavorited By",. }, verify = True, )*Version: 1.0 Alpha **In each case, var <code>response</code> is a json object, that can be read/written in python using the requests method <code>response.json()</code>. Each endpoint used above are instances of e.g. <code>GET events/search/</code> or <code>GET categories/Development environment specs:id</code> EventBrite Twitter API methods. There are different parameters each GET function can harness to get more specific results. To populate a comprehensive local database, the '''dream''' is to systematic queries from different endpoints and collecting all resultsJSON library, without repetitiontwitter-python library, in a centralized database. In order to do thispandas library, I'll have to familarize further with these GET functions and develop a systematic approach to automate queries to the eventbrite serverPy 2. One way to do this is to import entrepreneurship buzzword libraries that are available on the web7, and make queries by iterating through these search strings systematicallyActiveState Komodo IDE 9.3*Eventbrite event objects in json are well====Pseudo-organized and consistent. There are many interesting fields such as the longitude/latitude decimals, apart from name/location/organizer/start-time/end-time data which are data we want to amass initially. code====**For instance, the upcoming startup weekend event in Seville looks like the following.[[Filefunction I:Capture 12.PNG|400px|none]]main driver**In the events object, organizer and venue are represented as ID's and have to be queried separately since they contain a multitude of string-value pairs such as "description", "logo", and "url" in the case of organizer data. Huge opportunity here generate empty table for more data extraction. Kudos to eventbrite for documenting their stuff meticulously. Can you tell I'm impressed? subsequent building with apt columns**To produce a local database, I'm using iterate through each status object in the <code>import pandas as pd</code> libraryobtained data, the <code>pandas.DataFrame</code> object and fill up the <code>pandas.DataFrame.to_csv()</code> method. Currentlytable rows as apt, I initialize a dataframe with columns of variables that I seek to extract, and iterate through one row per event objects **and venue/organizer objects within the main processing task being: write table to populate the dataframe with rows of event data. output file *function II: empty table generator**'''Still debugging/writing at the momentmodular caus of my unfamiliarity with pandas.DataFrame; modularity enables testing'''. *function IV: authenticator + twitter API access interface setup**authenticate using our granted consumer keys and access tokens**RDP went downobtains working twitter API object, major sadness.post-authentication
*function V: subquery #1
**iterate through main query object in order to further query for retweeters, i.e. GetRetweeter() and ???
===7/6: Alpha Development===----*Eventbrite stipulates a system of ID-numbering for all organizers and venues objects, for instance. **For the endpoint <code>GET /venues/function VI:id/</code>, replace <code>:id</code> with the venue_id associated with desired organizerraw data acquisitior**For the endpoint <code>GET /organizers/:id</code>, replace <code>:id</code> with the organizer_id associated with desired organizer**Where are these ID numbers located, you ask? Any query for an event will return them as values the the strings "venue_id" and "organizer_id"*Script development slowed considerably by lack grabs raw data of modularity and debugging functionalityrecent tweets using master_working_api object**Modules to generate query url strings from input GET**Module to create empty <code>pandas.DataFrame</code> table based on input rows make it json so we can access and columns**Modules to retrieve information from venues and organizer data from their respective ID numbers**To learn and operate komodo debugger and write appropriate tests for each modules detached from main driver function**To learn pandas.DataFrame and appropriate methods to update manipulate it *'''Notes and Ideas'''**Develop smart iteration to query for all events sought:::To create intelligent searches::::Note that eventbrite is esp good for free events:::Note that past events may extend only to a certain point:::Note that eventbrite was launched in 2006, but is the first major player in online event ticketing:::Category is always science and tech:::Organiser is impt; some entrepreneurship events are organised by known collectives:::Organiser description also has many impt keywords:::keywords from SEO material on [[marketing artfully]] is very good:::Event series, dates and venues endpoints are secondarily importanteasily
====Notes:====
*Modular development and unit testing are integral to writing fast, working code. no joke
*Problems with GetFavorites() method as it only returns the favorited list wrt authenticated use (i.e. BIPPMcNair), not input target user.
*'''Query rate limit hit while using subqueries to find the retweeters of every given tweet. Need to mitigate this problem somehow if we were to scale.'''
*A tweet looks like this in json:
[[File:Capture 16.PNG|400px|none]]
===7/7 14: Alpha Development #2dev wrap-up===----*Full swing: pseudo-code, modularity, docstrings, tests, naming styleBlack box can be pretty much sealed after a round of debugging*Komodo debugger worksAll output rqts fulfilled except for output "retweeter" list per tweet*Alpha development complete. All tests passed. Complete code as below. [https://github.com/scroungemyvibe/mcnair_center_buildsCode is live]*[https://blobgithub.com/masterscroungemyvibe/EventBrite_Webcrawler_Build.pymcnair_center_builds Sample output is live]*'''Notes'''Awaiting more discussion, modifications**Current query (without input parameters) by organizer ID returns only active events listed under organizer. For instance, techstars has 45 upcoming events and I am pulling 45 json event objects from the Ed mentioned populating a database.**Current build should be applied systematically according to lists of organizer_id's**Further build ideas[Social_Media_Entrepreneurship_Resources|Past Tweet-o-sphere experimentation/notes documented in code proper on the gitexcavation results]

Navigation menu