Eventbrite Webcrawler (Tool)

From edegan.com
Revision as of 12:45, 6 July 2016 by GunnyLiu (talk | contribs)
Jump to navigation Jump to search


McNair Project
Eventbrite Webcrawler (Tool)
Project logo 02.png
Project Information
Project Title
Start Date
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


Description

Notes: The Eventbrite Webcrawler aims to create an automated system to systematically locate, retrieve and store data regarding entrepreneurship-related events documented by the Eventbrite database, such as demo days, hackathons, open houses, startup weekends, and more. To be developed around Eventbrite APIv3 and Python 2.7.

Input: Eventbrite developer database

Output: Local database documenting entrepreneurship-related events defined by the keys "organiser," "date," and "street level address," and possibly more.

Development Notes

6/30: Project start

Eventbrite APIv3

7/5:

  • Eventbrite developer account for McNair: {first name: Anne, last name: Dayton, email: admin@mcnaircenter.org, password: amount}
  • Eventbrite API is well-documented and its database readily accessible. In the python dev environment, I am using the http requests library to make queries to the database, to obtain json data containing event objects that in turn contain organizer objects, venue objects, start/end time values, longitude/latitude values specific to each event. The requests library has inbuilt .json() access methods, simplifying the json reading/writing process.
    • In querying for events organized by techstar, one of the biggest startup programs organization in the U.S., I use the following. Note that the organizer ID of techstar is 2300226659.
import requests
response = requests.get(
    "https://www.eventbriteapi.com/v3/organizers/2300226659/events/",
    headers = {
        "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN",
    },
    verify = True,
)
    • In querying for, instead, keywords such as "startup weekend," I use:
import requests
response = requests.get(
    "https://www.eventbriteapi.com/v3/events/search/q="startup weekend"",
    headers = {
        "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN",
    },
    verify = True,  
)
    • In querying for events parked under the category "science and technology", I use the following. However, this query also returns scientific seminars unrelated to entrepreneurship and is yet to be refined. Note that the category ID of science and technology is 102.
import requests
response = requests.get(
    "https://www.eventbriteapi.com/v3/categories/102",
    headers = {
        "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN",
    },
    verify = True,  
)
    • In each case, var response is a json object, that can be read/written in python using the requests method response.json(). Each endpoint used above are instances of e.g. GET events/search/ or GET categories/:id. There are different parameters each GET function can harness to get more specific results. To populate a comprehensive local database, I am thinking about looking into running systematic queries from different endpoints and collecting all results, without repetition, in a centralized database. In order to do this, I'll have to familarize further with these GET functions and develop a systematic approach to automate queries to the eventbrite server. One way to do this is
    • The dream is to systematically run