Difference between revisions of "URL Finder (Tool)"

From edegan.com
Jump to navigation Jump to search
Line 25: Line 25:
 
'''7/7''': Project start
 
'''7/7''': Project start
  
I am utilizing the <code>pandas</code> library to read and write CSV files in order to access the inputted CSV files. From there, I am simplifying the names of the companies using several functions from the aiding program, glink, to get rid of company identifiers such as "co., inc., llc., etc. and form the company names in a manner that is accessible by the Google Search API.
+
I am utilizing the <code>pandas</code> library to read and write CSV files in order to access the inputted CSV files. From there, I am simplifying the names of the companies using several functions from the aiding program, glink, to get rid of company identifiers such as "Co., INC., LLC., etc. and form the company names in a manner that is accessible by the Google Search API.
 
 
  
 +
fec['name_clean'] = fec["newname"].map(glink.remCorp)
 +
## #define a new column with clean names
 +
fec['download_status'] = fec['name_clean'].map(glink.gdownload)
  
 
'''7/5: Eventbrite API First-Take'''
 
'''7/5: Eventbrite API First-Take'''

Revision as of 15:13, 7 July 2016


McNair Project
URL Finder (Tool)
Project logo 02.png
Project Information
Project Title
Start Date
Deadline
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


Description

Notes: The URL Finder Tool automated algorithmic program to locate, retrieve and match URLs to corresponding Startup companies using the Google API. Developed through Python 2.7.

Input: CSV file containing a list of startup company names

Output: Matched URL for each company in the CSV file.

Development Notes

7/7: Project start

I am utilizing the pandas library to read and write CSV files in order to access the inputted CSV files. From there, I am simplifying the names of the companies using several functions from the aiding program, glink, to get rid of company identifiers such as "Co., INC., LLC., etc. and form the company names in a manner that is accessible by the Google Search API.

fec['name_clean'] = fec["newname"].map(glink.remCorp)

    1. #define a new column with clean names

fec['download_status'] = fec['name_clean'].map(glink.gdownload)

7/5: Eventbrite API First-Take

  • Eventbrite developer account for McNair Center:
    • first name: Anne, last name: Dayton
    • Login Email: admin@mcnaircenter.org
    • Login Password: amount
  • Eventbrite API is well-documented and its database readily accessible. In the python dev environment, I am using the http requests library to make queries to the database, to obtain json data containing event objects that in turn contain organizer objects, venue objects, start/end time values, longitude/latitude values specific to each event. The requests library has inbuilt .json() access methods, simplifying the json reading/writing process. Bang.
    • In querying for events organized by techstar, one of the biggest startup programs organization in the U.S., I use the following. Note that the organizer ID of techstar is 2300226659.
import requests
response = requests.get(
    "https://www.eventbriteapi.com/v3/organizers/2300226659/events/",
    headers = {
        "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN",
    },
    verify = True,
)
    • In querying for, instead, keywords such as "startup weekend," I use the following.
import requests
response = requests.get(
    "https://www.eventbriteapi.com/v3/events/search/q="startup weekend"",
    headers = {
        "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN",
    },
    verify = True,  
)
    • In querying for events parked under the category "science and technology", I use the following. However, this query also returns scientific seminars unrelated to entrepreneurship and is yet to be refined.
    • Note that the category ID of science and technology is 102.
import requests
response = requests.get(
    "https://www.eventbriteapi.com/v3/categories/102",
    headers = {
        "Authorization": "Bearer CRAQ5MAXEGHKEXSUSWXN",
    },
    verify = True,  
)
    • In each case, var response is a json object, that can be read/written in python using the requests method response.json(). Each endpoint used above are instances of e.g. GET events/search/ or GET categories/:id EventBrite API methods. There are different parameters each GET function can harness to get more specific results. To populate a comprehensive local database, the dream is to systematic queries from different endpoints and collecting all results, without repetition, in a centralized database. In order to do this, I'll have to familarize further with these GET functions and develop a systematic approach to automate queries to the eventbrite server. One way to do this is to import entrepreneurship buzzword libraries that are available on the web, and make queries by iterating through these search strings systematically.
  • Eventbrite event objects in json are well-organized and consistent. There are many interesting fields such as the longitude/latitude decimals, apart from name/location/organizer/start-time/end-time data which are data we want to amass initially.
    • For instance, the upcoming startup weekend event in Seville looks like the following.
Capture 12.PNG
    • In the events object, organizer and venue are represented as ID's and have to be queried separately since they contain a multitude of string-value pairs such as "description", "logo", and "url" in the case of organizer data. Huge opportunity here for more data extraction. Kudos to eventbrite for documenting their stuff meticulously. Can you tell I'm impressed?
    • To produce a local database, I'm using the import pandas as pd library, the pandas.DataFrame object and the pandas.DataFrame.to_csv() method. Currently, I initialize a dataframe with columns of variables that I seek to extract, and iterate through event objects and venue/organizer objects within to populate the dataframe with rows of event data.
    • Still debugging/writing at the moment.
    • RDP went down, major sadness.