<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://www.edegan.com/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Peterjalbert</id>
	<title>edegan.com - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="http://www.edegan.com/mediawiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=Peterjalbert"/>
	<link rel="alternate" type="text/html" href="http://www.edegan.com/wiki/Special:Contributions/Peterjalbert"/>
	<updated>2026-05-17T10:53:54Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.34.2</generator>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22387</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22387"/>
		<updated>2017-12-22T17:37:24Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has Image=selenium.jpg&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
These functions deal with web elements on the current page the driver is on. Any function that contains find_element_by returns a single web element, and any function that contains find_elements_by returns a list of web elements.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_class_name(class_name)&lt;br /&gt;
This function takes a string class name of the element you're looking for, and finds the first element on the page that has that class name. If there is a possibility that more than one web element on the page has the same class, you are probably better of using find_elements_by_class_name.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_name(name)&lt;br /&gt;
This function takes a string name of the element you're looking for, and finds the first element on the page that has a name attribute matching the string. Similar to the find_element_by_class_name function, this is not your best bet if there are multiple objects with the same name attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_id(id)&lt;br /&gt;
This function takes a string id of the element you're looking for, and finds the element on the page that has an id attribute matching the string. Since ids are guaranteed to be unique, this will always find the element you're looking for. This function is not helpful if the element you want to select does not have an id attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_xpath(xpath)&lt;br /&gt;
This function takes an XPATH, and returns the first web element that matches the path. This should not be your first choice if many elements can share the same XPATH. Contrary to all the above functions, XPATH can be used to find any web element, regardless of its attributes. However, XPATH takes some time to learn, and is more complex than all of the above. After investing some time, XPATH is the most secure way to find the elements you're looking for.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_class_name(class_name)&lt;br /&gt;
This is similar to find_element_by_class_name, except it returns a list of all matches with the class name. This allows you to iterate over the results or index them accordingly. This is often useful for search results, or any sort of list based queries.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_name(name)&lt;br /&gt;
Same as find_element_by_name, except it returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_xpath(xpath)&lt;br /&gt;
Same as find_element_by_xpath, but returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
A tutorial on XPATH can be found [https://www.w3schools.com/xml/xpath_intro.asp here].&lt;br /&gt;
&lt;br /&gt;
===Javascript===&lt;br /&gt;
If you are familiar with Javascript, you can inject Javascript into the driver to envoke certain behaviors. Simply use:&lt;br /&gt;
 driver.execute_script(someJavascriptCode)&lt;br /&gt;
&lt;br /&gt;
For example, the following could be used to scroll to the bottom of the page:&lt;br /&gt;
 driver.execute_script(&amp;quot;window.scrollTo(0, document.body.scrollHeight);&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
===Clicking===&lt;br /&gt;
If an element is clickable (such as a link or a button), you can click on that element by doing:&lt;br /&gt;
 element.click()&lt;br /&gt;
&lt;br /&gt;
This will route you to the linked page, or execute the action of the button. The driver will now be on the new page, and commands will deal with elements on the new page.&lt;br /&gt;
&lt;br /&gt;
===Web Elements===&lt;br /&gt;
This area contains methods that can be called on a web element.&lt;br /&gt;
&lt;br /&gt;
 element.get_attribute(attr)&lt;br /&gt;
&lt;br /&gt;
This function gets an attribute of the web element by using the attribute identifier. For example, element.get_attribute(&amp;quot;href&amp;quot;) would get the URL of a button or link.&lt;br /&gt;
&lt;br /&gt;
===Open a New Window===&lt;br /&gt;
OPTION 1: Cntrl + Click&lt;br /&gt;
-----------------------&lt;br /&gt;
&lt;br /&gt;
This method will utilize Action Chains and Keys. Action Chains simply queue commands, and the string of commands are executed once the perform() method is called. First, import these dependencies:&lt;br /&gt;
 from selenium.webdriver.common.action_chains import ActionChains&lt;br /&gt;
 from selenium.webdriver.common.keys import Keys&lt;br /&gt;
&lt;br /&gt;
Next, select the element using one of the selectors listed above. We will assume this web element is stored in the variable called element. Then, add a key down command to Action Chains using the Shift Key to simulate the shift button being pressed down:&lt;br /&gt;
 ActionChains(driver).key_down(Keys.SHIFT).perform()&lt;br /&gt;
&lt;br /&gt;
The Shift button is now &amp;quot;pressed&amp;quot; down. A click will now be a Shift + Click.&lt;br /&gt;
 element.click()&lt;br /&gt;
&lt;br /&gt;
This will simulate a Shift + Click on a link. A new window should open with the new link.&lt;br /&gt;
Execute a key up command with Action Chains on the Shift key so your future commands are not combined with shift:&lt;br /&gt;
 ActionChains(driver).key_up(Keys.SHIFT).perform()&lt;br /&gt;
&lt;br /&gt;
Now the browser is displaying a new window, but the driver thinks it is still on the previous window. To change this, access the window handlers:&lt;br /&gt;
 handles = driver.window_handles&lt;br /&gt;
&lt;br /&gt;
And switch the driver's &amp;quot;focus&amp;quot; to the new window (the last window in this list of windows is the newest one):&lt;br /&gt;
 driver.switch_to_window(handles[-1])&lt;br /&gt;
&lt;br /&gt;
Now you can execute commands on the new window! To exit this window, use:&lt;br /&gt;
 driver.close()&lt;br /&gt;
&lt;br /&gt;
When closing a window, you will also need to return the driver to the previous handler by repeating the code:&lt;br /&gt;
 handles = driver.window_handles&lt;br /&gt;
 driver.switch_to_window(handles[-1])&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
OPTION 2: Open Blank Window + get()&lt;br /&gt;
-----------------------------------&lt;br /&gt;
&lt;br /&gt;
This method finds URLs that you wish to traverse, and opens up a new window for each URL you would like to visit. It utlizies Javascript injection and the GET method.&lt;br /&gt;
&lt;br /&gt;
PROS: This method allows you to check a URL string for correctness before visiting the page. This can save you trouble from TimeOut Errors if someone has linked a rotten link on their webpage.&lt;br /&gt;
&lt;br /&gt;
This assumes you have a URL retrieved by using element.get_attribute(&amp;quot;href&amp;quot;) or some other method. We assume this is stored in the variable url.&lt;br /&gt;
&lt;br /&gt;
First, open a new window by injecting javascript:&lt;br /&gt;
 driver.execute_script(&amp;quot;window.open('');&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
Now the browser is displaying a new window, but the driver thinks it is still on the previous window. To change this, access the window handlers:&lt;br /&gt;
 handles = driver.window_handles&lt;br /&gt;
&lt;br /&gt;
And switch the driver's &amp;quot;focus&amp;quot; to the new window (the last window in this list of windows is the newest one):&lt;br /&gt;
 driver.switch_to_window(handles[-1])&lt;br /&gt;
&lt;br /&gt;
Now, you can execute commands on the new window. To get to the webpage you want to visit:&lt;br /&gt;
 driver.get(url)&lt;br /&gt;
&lt;br /&gt;
Now you can interact with anything on the new page using the current driver. To exit this window, use:&lt;br /&gt;
 driver.close()&lt;br /&gt;
&lt;br /&gt;
When closing a window, you will also need to return the driver to the previous handler by repeating the code:&lt;br /&gt;
 handles = driver.window_handles&lt;br /&gt;
 driver.switch_to_window(handles[-1])&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Downloading Files===&lt;br /&gt;
In general, I recommend using Selenium to do browsing, and use another method to download the file.&lt;br /&gt;
 driver.current_url&lt;br /&gt;
&lt;br /&gt;
This will get the current url. From there, standard libraries such as wget or urlretrieve can be used to download the file if the url ends in .pdf, or the html of the page if it is a regular webpage.&lt;br /&gt;
&lt;br /&gt;
If you are trying to retrieve a body of text, find the element using selectors. Then:&lt;br /&gt;
 element.text&lt;br /&gt;
&lt;br /&gt;
will retrieve the text in that element as a string. This can then be written to a file in any way you see fit.&lt;br /&gt;
&lt;br /&gt;
[http://stackabuse.com/download-files-with-python/ Helpful link for downloading files in python]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22386</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22386"/>
		<updated>2017-12-21T21:03:56Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has Image=selenium.jpg&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
These functions deal with web elements on the current page the driver is on. Any function that contains find_element_by returns a single web element, and any function that contains find_elements_by returns a list of web elements.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_class_name(class_name)&lt;br /&gt;
This function takes a string class name of the element you're looking for, and finds the first element on the page that has that class name. If there is a possibility that more than one web element on the page has the same class, you are probably better of using find_elements_by_class_name.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_name(name)&lt;br /&gt;
This function takes a string name of the element you're looking for, and finds the first element on the page that has a name attribute matching the string. Similar to the find_element_by_class_name function, this is not your best bet if there are multiple objects with the same name attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_id(id)&lt;br /&gt;
This function takes a string id of the element you're looking for, and finds the element on the page that has an id attribute matching the string. Since ids are guaranteed to be unique, this will always find the element you're looking for. This function is not helpful if the element you want to select does not have an id attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_xpath(xpath)&lt;br /&gt;
This function takes an XPATH, and returns the first web element that matches the path. This should not be your first choice if many elements can share the same XPATH. Contrary to all the above functions, XPATH can be used to find any web element, regardless of its attributes. However, XPATH takes some time to learn, and is more complex than all of the above. After investing some time, XPATH is the most secure way to find the elements you're looking for.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_class_name(class_name)&lt;br /&gt;
This is similar to find_element_by_class_name, except it returns a list of all matches with the class name. This allows you to iterate over the results or index them accordingly. This is often useful for search results, or any sort of list based queries.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_name(name)&lt;br /&gt;
Same as find_element_by_name, except it returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_xpath(xpath)&lt;br /&gt;
Same as find_element_by_xpath, but returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
A tutorial on XPATH can be found [https://www.w3schools.com/xml/xpath_intro.asp here].&lt;br /&gt;
&lt;br /&gt;
===Javascript===&lt;br /&gt;
If you are familiar with Javascript, you can inject Javascript into the driver to envoke certain behaviors. Simply use:&lt;br /&gt;
 driver.execute_script(someJavascriptCode)&lt;br /&gt;
&lt;br /&gt;
For example, the following could be used to scroll to the bottom of the page:&lt;br /&gt;
 driver.execute_script(&amp;quot;window.scrollTo(0, document.body.scrollHeight);&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
===Clicking===&lt;br /&gt;
If an element is clickable (such as a link or a button), you can click on that element by doing:&lt;br /&gt;
 element.click()&lt;br /&gt;
&lt;br /&gt;
This will route you to the linked page, or execute the action of the button. The driver will now be on the new page, and commands will deal with elements on the new page.&lt;br /&gt;
&lt;br /&gt;
===Open a New Window===&lt;br /&gt;
OPTION 1: Cntrl + Click&lt;br /&gt;
-----------------------&lt;br /&gt;
&lt;br /&gt;
This method will utilize Action Chains and Keys. Action Chains simply queue commands, and the string of commands are executed once the perform() method is called. First, import these dependencies:&lt;br /&gt;
 from selenium.webdriver.common.action_chains import ActionChains&lt;br /&gt;
 from selenium.webdriver.common.keys import Keys&lt;br /&gt;
&lt;br /&gt;
OPTION 2: Open Blank Window + get()&lt;br /&gt;
-----------------------------------&lt;br /&gt;
&lt;br /&gt;
===Downloading Files===&lt;br /&gt;
In general, I recommend using Selenium to do browsing, and use another method to download the file.&lt;br /&gt;
 driver.current_url&lt;br /&gt;
&lt;br /&gt;
This will get the current url. From there, standard libraries such as wget or urlretrieve can be used to download the file if the url ends in .pdf, or the html of the page if it is a regular webpage.&lt;br /&gt;
&lt;br /&gt;
If you are trying to retrieve a body of text, find the element using selectors. Then:&lt;br /&gt;
 element.text&lt;br /&gt;
&lt;br /&gt;
will retrieve the text in that element as a string. This can then be written to a file in any way you see fit.&lt;br /&gt;
&lt;br /&gt;
[http://stackabuse.com/download-files-with-python/ Helpful link for downloading files in python]&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22385</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22385"/>
		<updated>2017-12-21T20:48:51Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has Image=selenium.jpg&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
These functions deal with web elements on the current page the driver is on. Any function that contains find_element_by returns a single web element, and any function that contains find_elements_by returns a list of web elements.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_class_name(class_name)&lt;br /&gt;
This function takes a string class name of the element you're looking for, and finds the first element on the page that has that class name. If there is a possibility that more than one web element on the page has the same class, you are probably better of using find_elements_by_class_name.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_name(name)&lt;br /&gt;
This function takes a string name of the element you're looking for, and finds the first element on the page that has a name attribute matching the string. Similar to the find_element_by_class_name function, this is not your best bet if there are multiple objects with the same name attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_id(id)&lt;br /&gt;
This function takes a string id of the element you're looking for, and finds the element on the page that has an id attribute matching the string. Since ids are guaranteed to be unique, this will always find the element you're looking for. This function is not helpful if the element you want to select does not have an id attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_xpath(xpath)&lt;br /&gt;
This function takes an XPATH, and returns the first web element that matches the path. This should not be your first choice if many elements can share the same XPATH. Contrary to all the above functions, XPATH can be used to find any web element, regardless of its attributes. However, XPATH takes some time to learn, and is more complex than all of the above. After investing some time, XPATH is the most secure way to find the elements you're looking for.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_class_name(class_name)&lt;br /&gt;
This is similar to find_element_by_class_name, except it returns a list of all matches with the class name. This allows you to iterate over the results or index them accordingly. This is often useful for search results, or any sort of list based queries.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_name(name)&lt;br /&gt;
Same as find_element_by_name, except it returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_xpath(xpath)&lt;br /&gt;
Same as find_element_by_xpath, but returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
A tutorial on XPATH can be found [https://www.w3schools.com/xml/xpath_intro.asp here].&lt;br /&gt;
&lt;br /&gt;
===Javascript===&lt;br /&gt;
If you are familiar with Javascript, you can inject Javascript into the driver to envoke certain behaviors. Simply use:&lt;br /&gt;
 driver.execute_script(someJavascriptCode)&lt;br /&gt;
&lt;br /&gt;
For example, the following could be used to scroll to the bottom of the page:&lt;br /&gt;
 driver.execute_script(&amp;quot;window.scrollTo(0, document.body.scrollHeight);&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22384</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22384"/>
		<updated>2017-12-21T20:47:37Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has image=selenium.jpg&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
These functions deal with web elements on the current page the driver is on. Any function that contains find_element_by returns a single web element, and any function that contains find_elements_by returns a list of web elements.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_class_name(class_name)&lt;br /&gt;
This function takes a string class name of the element you're looking for, and finds the first element on the page that has that class name. If there is a possibility that more than one web element on the page has the same class, you are probably better of using find_elements_by_class_name.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_name(name)&lt;br /&gt;
This function takes a string name of the element you're looking for, and finds the first element on the page that has a name attribute matching the string. Similar to the find_element_by_class_name function, this is not your best bet if there are multiple objects with the same name attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_id(id)&lt;br /&gt;
This function takes a string id of the element you're looking for, and finds the element on the page that has an id attribute matching the string. Since ids are guaranteed to be unique, this will always find the element you're looking for. This function is not helpful if the element you want to select does not have an id attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_xpath(xpath)&lt;br /&gt;
This function takes an XPATH, and returns the first web element that matches the path. This should not be your first choice if many elements can share the same XPATH. Contrary to all the above functions, XPATH can be used to find any web element, regardless of its attributes. However, XPATH takes some time to learn, and is more complex than all of the above. After investing some time, XPATH is the most secure way to find the elements you're looking for.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_class_name(class_name)&lt;br /&gt;
This is similar to find_element_by_class_name, except it returns a list of all matches with the class name. This allows you to iterate over the results or index them accordingly. This is often useful for search results, or any sort of list based queries.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_name(name)&lt;br /&gt;
Same as find_element_by_name, except it returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_xpath(xpath)&lt;br /&gt;
Same as find_element_by_xpath, but returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
A tutorial on XPATH can be found [https://www.w3schools.com/xml/xpath_intro.asp here].&lt;br /&gt;
&lt;br /&gt;
===Javascript===&lt;br /&gt;
If you are familiar with Javascript, you can inject Javascript into the driver to envoke certain behaviors. Simply use:&lt;br /&gt;
 driver.execute_script(someJavascriptCode)&lt;br /&gt;
&lt;br /&gt;
For example, the following could be used to scroll to the bottom of the page:&lt;br /&gt;
 driver.execute_script(&amp;quot;window.scrollTo(0, document.body.scrollHeight);&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=File:Selenium.jpg&amp;diff=22383</id>
		<title>File:Selenium.jpg</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=File:Selenium.jpg&amp;diff=22383"/>
		<updated>2017-12-21T20:46:36Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22382</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22382"/>
		<updated>2017-12-21T15:58:46Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-21: Last minute adjustments to the Moroccan Data. Continued working on [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation].&lt;br /&gt;
&lt;br /&gt;
2017-12-20: Working on Selenium Documentation. Wrote 2 demo files. Wiki Page is avaiable [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation here]. Created 3 spreadsheets for the Moroccan data.&lt;br /&gt;
&lt;br /&gt;
2017-12-19: Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to make linked in crawler compatible with the RDP. Removed some of the bells and whistles.&lt;br /&gt;
&lt;br /&gt;
2017-12-18: Continued finding errors with the Demo Day Crawler analysis. Rewrote the parser to remove any search terms that were in the top 10000 most common English words according to Google. Finished uploading and submitting Moroccan data.&lt;br /&gt;
&lt;br /&gt;
2017-12-15: Found errors with the Demo Day Crawler. Fixed scripts to download Moroccan Law Data.&lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22381</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22381"/>
		<updated>2017-12-20T17:21:21Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
These functions deal with web elements on the current page the driver is on. Any function that contains find_element_by returns a single web element, and any function that contains find_elements_by returns a list of web elements.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_class_name(class_name)&lt;br /&gt;
This function takes a string class name of the element you're looking for, and finds the first element on the page that has that class name. If there is a possibility that more than one web element on the page has the same class, you are probably better of using find_elements_by_class_name.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_name(name)&lt;br /&gt;
This function takes a string name of the element you're looking for, and finds the first element on the page that has a name attribute matching the string. Similar to the find_element_by_class_name function, this is not your best bet if there are multiple objects with the same name attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_id(id)&lt;br /&gt;
This function takes a string id of the element you're looking for, and finds the element on the page that has an id attribute matching the string. Since ids are guaranteed to be unique, this will always find the element you're looking for. This function is not helpful if the element you want to select does not have an id attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_xpath(xpath)&lt;br /&gt;
This function takes an XPATH, and returns the first web element that matches the path. This should not be your first choice if many elements can share the same XPATH. Contrary to all the above functions, XPATH can be used to find any web element, regardless of its attributes. However, XPATH takes some time to learn, and is more complex than all of the above. After investing some time, XPATH is the most secure way to find the elements you're looking for.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_class_name(class_name)&lt;br /&gt;
This is similar to find_element_by_class_name, except it returns a list of all matches with the class name. This allows you to iterate over the results or index them accordingly. This is often useful for search results, or any sort of list based queries.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_name(name)&lt;br /&gt;
Same as find_element_by_name, except it returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_xpath(xpath)&lt;br /&gt;
Same as find_element_by_xpath, but returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
A tutorial on XPATH can be found [https://www.w3schools.com/xml/xpath_intro.asp here].&lt;br /&gt;
&lt;br /&gt;
===Javascript===&lt;br /&gt;
If you are familiar with Javascript, you can inject Javascript into the driver to envoke certain behaviors. Simply use:&lt;br /&gt;
 driver.execute_script(someJavascriptCode)&lt;br /&gt;
&lt;br /&gt;
For example, the following could be used to scroll to the bottom of the page:&lt;br /&gt;
 driver.execute_script(&amp;quot;window.scrollTo(0, document.body.scrollHeight);&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22380</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22380"/>
		<updated>2017-12-20T17:19:15Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
These functions deal with web elements on the current page the driver is on. Any function that contains find_element_by returns a single web element, and any function that contains find_elements_by returns a list of web elements.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_class_name(class_name)&lt;br /&gt;
This function takes a string class name of the element you're looking for, and finds the first element on the page that has that class name. If there is a possibility that more than one web element on the page has the same class, you are probably better of using find_elements_by_class_name.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_name(name)&lt;br /&gt;
This function takes a string name of the element you're looking for, and finds the first element on the page that has a name attribute matching the string. Similar to the find_element_by_class_name function, this is not your best bet if there are multiple objects with the same name attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_id(id)&lt;br /&gt;
This function takes a string id of the element you're looking for, and finds the element on the page that has an id attribute matching the string. Since ids are guaranteed to be unique, this will always find the element you're looking for. This function is not helpful if the element you want to select does not have an id attribute.&lt;br /&gt;
&lt;br /&gt;
 driver.find_element_by_xpath(xpath)&lt;br /&gt;
This function takes an XPATH, and returns the first web element that matches the path. This should not be your first choice if many elements can share the same XPATH. Contrary to all the above functions, XPATH can be used to find any web element, regardless of its attributes. However, XPATH takes some time to learn, and is more complex than all of the above. After investing some time, XPATH is the most secure way to find the elements you're looking for.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_class_name(class_name)&lt;br /&gt;
This is similar to find_element_by_class_name, except it returns a list of all matches with the class name. This allows you to iterate over the results or index them accordingly. This is often useful for search results, or any sort of list based queries.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_name(name)&lt;br /&gt;
Same as find_element_by_name, except it returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
 driver.find_elements_by_xpath(xpath)&lt;br /&gt;
Same as find_element_by_xpath, but returns a list of all matches.&lt;br /&gt;
&lt;br /&gt;
A tutorial on XPATH can be found [https://www.w3schools.com/xml/xpath_intro.asp here].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22379</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22379"/>
		<updated>2017-12-20T16:58:32Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22378</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22378"/>
		<updated>2017-12-20T16:58:02Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
Then, create an instance of the web driver. The RDP has bindings for Google Chrome and Mozilla Firefox. The following will launch a web browser on Google Chrome.&lt;br /&gt;
 driver = webdriver.Chrome()&lt;br /&gt;
&lt;br /&gt;
The GET method is used to visit a website. The get() command in Selenium takes a string url.&lt;br /&gt;
 driver.get(&amp;quot;http://www.google.com&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
From here, different methods can be used to interact with the page. Most interactions involve some type of exchange with a web element. Selenium comes with many different ways to locate specific elements. To see the attributes of the element you want to work with, it is often a good idea to visit that page on your own browser, right click on the element you want your program to interact with, and select INSPECT. This will bring up the developer console and display the HTML representation of that element. From there, you can use one of the following selectors that best matches what you need.&lt;br /&gt;
&lt;br /&gt;
===Selectors===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;br /&gt;
&lt;br /&gt;
==Advanced==&lt;br /&gt;
The folder for the Web Driver Executables can be found:&lt;br /&gt;
 C:\SeleniumDriver&lt;br /&gt;
&lt;br /&gt;
chromedriver.exe is an executable to launch Google Chrome, and geckodriver.exe is an executable to launch Mozilla Firefox. Any new drivers for different web browsers should be placed in this folder.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22377</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22377"/>
		<updated>2017-12-20T16:23:54Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
===Launching a Driver===&lt;br /&gt;
The first step is to launch a driver. This is an object that has information on the current page including its url and web elements, and is the object you interact with to do any sort of navigation. First, import the webdriver:&lt;br /&gt;
 from selenium import webdriver&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22376</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22376"/>
		<updated>2017-12-20T16:21:13Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-20: Working on Selenium Documentation. Wrote 2 demo files. Wiki Page is avaiable [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation here].&lt;br /&gt;
&lt;br /&gt;
2017-12-19: Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to make linked in crawler compatible with the RDP. Removed some of the bells and whistles.&lt;br /&gt;
&lt;br /&gt;
2017-12-18: Continued finding errors with the Demo Day Crawler analysis. Rewrote the parser to remove any search terms that were in the top 10000 most common English words according to Google. Finished uploading and submitting Moroccan data.&lt;br /&gt;
&lt;br /&gt;
2017-12-15: Found errors with the Demo Day Crawler. Fixed scripts to download Moroccan Law Data.&lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22375</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22375"/>
		<updated>2017-12-20T16:20:02Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22374</id>
		<title>Selenium Documentation</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Selenium_Documentation&amp;diff=22374"/>
		<updated>2017-12-20T16:19:48Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Selenium Documentation&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
}}&lt;br /&gt;
[http://www.seleniumhq.org/projects/webdriver/ Selenium Web Driver] is a framework often used for automated web application testing. It uses an API to launch a web browser and browse sites from the client's perspective. Popular Selenium bindings exist for [http://selenium-python.readthedocs.io/ Python], [http://seleniumhq.github.io/selenium/docs/api/java/index.html Java], [https://www.npmjs.com/package/selenium-webdriver Javascript], and other languages. This documentation covers Selenium Web Driver using Python3.&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
A full list of installation documentation can be found [http://selenium-python.readthedocs.io/installation.html# here].&lt;br /&gt;
&lt;br /&gt;
This documentation assumes you have Python 3.6 or later installed. If you do not, visit the [https://www.python.org/downloads/ Python Download page].&lt;br /&gt;
&lt;br /&gt;
From the command line, enter&lt;br /&gt;
 pip install selenium&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==The Basics==&lt;br /&gt;
&lt;br /&gt;
A folder with tutorial code can be found on the RDP in:&lt;br /&gt;
 E:\McNair\Software\Selenium Tutorial&lt;br /&gt;
&lt;br /&gt;
==Helpful Links==&lt;br /&gt;
[https://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python How to Download a file in Python with a URL]&lt;br /&gt;
[https://stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python Selenium: Scroll a Webpage]&lt;br /&gt;
[https://stackoverflow.com/questions/24795198/selenium-python-get-all-children-elements Selenium: Get Children Elements]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22373</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22373"/>
		<updated>2017-12-19T19:35:13Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
-----------------&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of search terms to match with the text versions of news articles:&lt;br /&gt;
 CohortAndAcceleratorsFullList.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;br /&gt;
&lt;br /&gt;
A file with an analysis of the most frequent matched words in each text file:&lt;br /&gt;
 topWordsFull.txt&lt;br /&gt;
&lt;br /&gt;
==Faulty Results==&lt;br /&gt;
The first pass through the data revealed articles that had thousands of hits for keyword matches. This seemed highly suspicious, so we dug in deeper to investigate the cause of this issue. &lt;br /&gt;
&lt;br /&gt;
The following script in the same directory analyzes the keyword matches to determine the words with the highest number of hits.&lt;br /&gt;
 DemoDayAnalysis.py&lt;br /&gt;
&lt;br /&gt;
After investigation, it was found that many company names were taken after common english words. Here are some of the companies causing issues along with their associated accelerator:&lt;br /&gt;
&lt;br /&gt;
the, L-Spark&lt;br /&gt;
&lt;br /&gt;
Matter, This., [https://matter.vc/portfolio/this/ website]&lt;br /&gt;
&lt;br /&gt;
Fledge, HERE, [http://fledge.co/fledgling/here/ website]&lt;br /&gt;
&lt;br /&gt;
StartupBootCamp, We...&lt;br /&gt;
&lt;br /&gt;
LightBank Start, Zero&lt;br /&gt;
&lt;br /&gt;
Entrepreneurs Roundtable Accelerator,  SELECT&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Her&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Final&lt;br /&gt;
&lt;br /&gt;
AngelCube, class&lt;br /&gt;
&lt;br /&gt;
Matter, common&lt;br /&gt;
&lt;br /&gt;
L-Spark, Company&lt;br /&gt;
&lt;br /&gt;
Techstars, Hot&lt;br /&gt;
&lt;br /&gt;
Rather than removing these companies from the list of search terms, we opted to not include as search terms any words that were considered among the top 10000 most common English words. For reference, we used the top 10000 most common English words according to a Google research study. The github documentation of the study can be found [https://github.com/first20hours/google-10000-english here].&lt;br /&gt;
&lt;br /&gt;
The file containing the 10000 most common English words can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators\10000_common_words.txt&lt;br /&gt;
&lt;br /&gt;
The results seemed much more plausible after removing these words. Some company words still appeared many times, but in the correct context.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22372</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22372"/>
		<updated>2017-12-19T19:34:03Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
-----------------&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of search terms to match with the text versions of news articles:&lt;br /&gt;
 CohortAndAcceleratorsFullList.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;br /&gt;
&lt;br /&gt;
A file with an analysis of the most frequent matched words in each text file:&lt;br /&gt;
 topWordsNew.txt&lt;br /&gt;
&lt;br /&gt;
==Faulty Results==&lt;br /&gt;
The first pass through the data revealed articles that had thousands of hits for keyword matches. This seemed highly suspicious, so we dug in deeper to investigate the cause of this issue. &lt;br /&gt;
&lt;br /&gt;
The following script in the same directory analyzes the keyword matches to determine the words with the highest number of hits.&lt;br /&gt;
 DemoDayAnalysis.py&lt;br /&gt;
&lt;br /&gt;
After investigation, it was found that many company names were taken after common english words. Here are some of the companies causing issues along with their associated accelerator:&lt;br /&gt;
&lt;br /&gt;
the, L-Spark&lt;br /&gt;
&lt;br /&gt;
Matter, This., [https://matter.vc/portfolio/this/ website]&lt;br /&gt;
&lt;br /&gt;
Fledge, HERE, [http://fledge.co/fledgling/here/ website]&lt;br /&gt;
&lt;br /&gt;
StartupBootCamp, We...&lt;br /&gt;
&lt;br /&gt;
LightBank Start, Zero&lt;br /&gt;
&lt;br /&gt;
Entrepreneurs Roundtable Accelerator,  SELECT&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Her&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Final&lt;br /&gt;
&lt;br /&gt;
AngelCube, class&lt;br /&gt;
&lt;br /&gt;
Matter, common&lt;br /&gt;
&lt;br /&gt;
L-Spark, Company&lt;br /&gt;
&lt;br /&gt;
Techstars, Hot&lt;br /&gt;
&lt;br /&gt;
Rather than removing these companies from the list of search terms, we opted to not include as search terms any words that were considered among the top 10000 most common English words. For reference, we used the top 10000 most common English words according to a Google research study. The github documentation of the study can be found [https://github.com/first20hours/google-10000-english here].&lt;br /&gt;
&lt;br /&gt;
The file containing the 10000 most common English words can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators\10000_common_words.txt&lt;br /&gt;
&lt;br /&gt;
The results seemed much more plausible after removing these words. Some company words still appeared many times, but in the correct context.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22371</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22371"/>
		<updated>2017-12-19T19:31:32Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
-----------------&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of potential keywords (used for throwing out irrelevant results):&lt;br /&gt;
 Keywords.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;br /&gt;
&lt;br /&gt;
==Faulty Results==&lt;br /&gt;
The first pass through the data revealed articles that had thousands of hits for keyword matches. This seemed highly suspicious, so we dug in deeper to investigate the cause of this issue. &lt;br /&gt;
&lt;br /&gt;
The following script in the same directory analyzes the keyword matches to determine the words with the highest number of hits.&lt;br /&gt;
 DemoDayAnalysis.py&lt;br /&gt;
&lt;br /&gt;
After investigation, it was found that many company names were taken after common english words. Here are some of the companies causing issues along with their associated accelerator:&lt;br /&gt;
&lt;br /&gt;
the, L-Spark&lt;br /&gt;
&lt;br /&gt;
Matter, This., [https://matter.vc/portfolio/this/ website]&lt;br /&gt;
&lt;br /&gt;
Fledge, HERE, [http://fledge.co/fledgling/here/ website]&lt;br /&gt;
&lt;br /&gt;
StartupBootCamp, We...&lt;br /&gt;
&lt;br /&gt;
LightBank Start, Zero&lt;br /&gt;
&lt;br /&gt;
Entrepreneurs Roundtable Accelerator,  SELECT&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Her&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Final&lt;br /&gt;
&lt;br /&gt;
AngelCube, class&lt;br /&gt;
&lt;br /&gt;
Matter, common&lt;br /&gt;
&lt;br /&gt;
L-Spark, Company&lt;br /&gt;
&lt;br /&gt;
Techstars, Hot&lt;br /&gt;
&lt;br /&gt;
Rather than removing these companies from the list of search terms, we opted to not include as search terms any words that were considered among the top 10000 most common English words. For reference, we used the top 10000 most common English words according to a Google research study. The github documentation of the study can be found [https://github.com/first20hours/google-10000-english here].&lt;br /&gt;
&lt;br /&gt;
The file containing the 10000 most common English words can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators\10000_common_words.txt&lt;br /&gt;
&lt;br /&gt;
The results seemed much more plausible after removing these words. Some company words still appeared many times, but in the correct context.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=LinkedIn_Crawler_(Python)&amp;diff=22370</id>
		<title>LinkedIn Crawler (Python)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=LinkedIn_Crawler_(Python)&amp;diff=22370"/>
		<updated>2017-12-19T19:27:50Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has Image=Web-crawler.jpg&lt;br /&gt;
|Has title=LinkedIn Crawler (Python)&lt;br /&gt;
|Has start date=April 3, 2017&lt;br /&gt;
|Has keywords=Selenium, LinkedIn, Crawler,Tool&lt;br /&gt;
}}&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
Files for this project can be found on our Git Server under the directory LinkedIn_Crawler.&lt;br /&gt;
&lt;br /&gt;
This page is dedicated to a new LinkedIn Crawler built using Selenium and Python. The goal of this project is to be able to crawl LinkedIn without being caught by LinkedIn's aggressive [https://www.linkedin.com/help/linkedin/answer/56347/prohibition-of-scraping-software?lang=en anti-scraping rules.] To do this, we will use Selenium to behave like a human, and use time delays to hide bot-like tendencies.&lt;br /&gt;
&lt;br /&gt;
The documentation for Selenium Web Driver can be found [here http://selenium-python.readthedocs.io/index.html].&lt;br /&gt;
&lt;br /&gt;
Relevant scripts can be found in the following directory:&lt;br /&gt;
 E:\McNair\Projects\LinkedIn Crawler&lt;br /&gt;
&lt;br /&gt;
The resulting data for accelerator founders can be found:&lt;br /&gt;
 E:\McNair\Projects\LinkedIn Crawler\LinkedIn_Crawler\linkedin\accelerator_founders_data&lt;br /&gt;
&lt;br /&gt;
The code from the original Summer 2016 Project can be found in:&lt;br /&gt;
 web_crawler\linkedin&lt;br /&gt;
&lt;br /&gt;
The next section will provide details on the construction and functionality of the scripts located in the linkedin directory.&lt;br /&gt;
&lt;br /&gt;
The old documentation said that the programs/scripts (see details below) are located on our [[Software Repository|Bonobo Git Server]]. &lt;br /&gt;
 repository: Web_Crawler&lt;br /&gt;
 branch: researcher/linkedin&lt;br /&gt;
 directory: /linkedin&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Accounts==&lt;br /&gt;
Test Account:&lt;br /&gt;
&lt;br /&gt;
email: testapplicat6@gmail.com&lt;br /&gt;
&lt;br /&gt;
pass: McNair2017&lt;br /&gt;
&lt;br /&gt;
Real Account:&lt;br /&gt;
&lt;br /&gt;
email: ed.edgan@rice.edu&lt;br /&gt;
&lt;br /&gt;
pass: This area has intentionally been left blank.&lt;br /&gt;
&lt;br /&gt;
=LinkedIn Scripts=&lt;br /&gt;
==Overview==&lt;br /&gt;
This section provides a file by file breakdown of the contents of the folder located at:&lt;br /&gt;
 E:\McNair\Projects\LinkedIn Crawler\web_crawler\linkedin&lt;br /&gt;
The main script to run is:&lt;br /&gt;
 run_linkedin_recruiter.py&lt;br /&gt;
&lt;br /&gt;
==run_linkedin_recruiter.py==&lt;br /&gt;
This script executes the linkedin recruiter crawler. At the top of the file, just below the imports, are three fields: username, password, and query_filepath. The username and password fields are for the desired recruiter pro account you would like to log into, and query_filepath is a pathname to a text file that contains a list of properly formatted queries that can be read by the LinkedIn Crawler's simple_search method. The following are the functions listed in the script.&lt;br /&gt;
&lt;br /&gt;
===main()===&lt;br /&gt;
This function runs the LinkedIn Crawler and will automatically begin when called from the command line. If you only want to go through some of the queries, you can change the range of the splice in line 32, and if you wish to only look at a certain number of search results, you can change the range of the splice in line 40.&lt;br /&gt;
&lt;br /&gt;
===open_new_window(driver, element)===&lt;br /&gt;
This function does a shift click on a web element to open the link in a new window. It then changes the window handler to the new window. This method makes it simple to view search results and close them in a quick manner.&lt;br /&gt;
&lt;br /&gt;
===close_window_and_return(driver)===&lt;br /&gt;
This function closes the current window, and returns to the main window. It is used in conjunction with open_new_window() to view search results and close them in an iterative manner.&lt;br /&gt;
&lt;br /&gt;
===close_tab(driver)===&lt;br /&gt;
When necessary, this function is used to close the current tab and return to the main tab. It is similar to close_window_and_return(). This function is used to log out of the account.&lt;br /&gt;
&lt;br /&gt;
==crawlererror.py==&lt;br /&gt;
This script is a simple class construction for error messages. It is used in other scripts to raise errors to the user when errors with the crawler occur. Please continue.&lt;br /&gt;
&lt;br /&gt;
==linked_in_crawler.py==&lt;br /&gt;
This script constructs a class that provides navigation functionality around the traditional LinkedIn site. The beginning section lists some global xpaths that will be used by Selenium throughout the process. These xpaths are used to locate elements within the HTML. The following are some important functions to keep in mind when designing original programs using this code.&lt;br /&gt;
&lt;br /&gt;
=== login(self, username, password)===&lt;br /&gt;
This function takes a username and password, and logs in to LinkedIn. During the process, the function uses the MouseMove move_random() function to move the mouse randomly across the screen like a crazy person.&lt;br /&gt;
&lt;br /&gt;
===logout(self)===&lt;br /&gt;
This function logs out of LinkedIn. It works by clicking on the profile picture, and then selecting logout.&lt;br /&gt;
&lt;br /&gt;
===go_back(self)===&lt;br /&gt;
This function goes back a page if you ever need to do such a thing.This function also doesn't seem to work.&lt;br /&gt;
&lt;br /&gt;
===simple_search(self, query)===&lt;br /&gt;
This function takes a string as a query, and searches it using the search box. At the end of the functions run, a page with search results relevant to your search query will be on the screen.&lt;br /&gt;
&lt;br /&gt;
===advance_search(self, query)===&lt;br /&gt;
This function uses the advanced search feature of LinkedIn. Instead of a string, this function takes in a dictionary mapping predetermined keywords to their necessary values. This function has not been debugged yet.&lt;br /&gt;
&lt;br /&gt;
===get_search_results_on_page(self)===&lt;br /&gt;
This function is supposed to return all the search results on the current page. This function has not been debugged yet.&lt;br /&gt;
&lt;br /&gt;
===get_next_search_page(self)===&lt;br /&gt;
This function is supposed to click and load the next search page if one exists. This function has not been debugged yet.&lt;br /&gt;
&lt;br /&gt;
==linked_in_crawler_recruiter.py==&lt;br /&gt;
This script constructs a class called LinkedInCrawlerRecruiter that implements functionality specifically for the Recruiter Pro feature of LinkedIn. Similar to the regular linked_in_crawler, the program begins with a list of relevant xpaths. It is followed by multiple functions. Their functionalities are listed below.&lt;br /&gt;
&lt;br /&gt;
===login(self, username, password)===&lt;br /&gt;
This function logs into a normal LinkedIn account, and then launches the Recruiter Pro session from the LinkedIn home page. At the end of the function run, there will be a window with the Recruiter Pro feature open, and the Selenium web frame will be on that window.&lt;br /&gt;
&lt;br /&gt;
===simple_search(self, query)===&lt;br /&gt;
Similar to the original LinkedIn Crawler, this function implements a basic string query search for the Recruiter Pro feature. At the end of the function run, a page will be up with the relevant search results of the search query.&lt;br /&gt;
&lt;br /&gt;
===help_search_handler_stuff(self)===&lt;br /&gt;
This function does some things on the current page in an attempt to appear more human. As of now, the function has a notes feature that will randomly jot down notes on the current page.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==utils.py==&lt;br /&gt;
This file contains a few useful functions for waiting and moving the mouse. This is the human file for this project.&lt;br /&gt;
&lt;br /&gt;
===sleep_secs(secs)===&lt;br /&gt;
This is a simple function that has the browser wait for a specified number of seconds.&lt;br /&gt;
&lt;br /&gt;
===sleep_rand(limit=__SLEEP_LIMIT__)===&lt;br /&gt;
This function has the browser wait for a random amount of time less than the user provided limit. If the user does not provide a limit, the browser waits for a random time less than 5 seconds.&lt;br /&gt;
&lt;br /&gt;
===move_strategy1(self)===&lt;br /&gt;
This is a function within the MouseMove class. This function moves the mouse randomly across the window. It uses autopy to move the mouse across the window visibly to the user.&lt;br /&gt;
&lt;br /&gt;
===move_to(self, x=None, y=None)===&lt;br /&gt;
This function is a function within the MouseMove class. Given an x and y coordinates on the screen, this function will move the mouse to that given point. &lt;br /&gt;
&lt;br /&gt;
===move_random(self)===&lt;br /&gt;
This function chooses a random MouseMove method and executes it.&lt;br /&gt;
&lt;br /&gt;
==web_driver.py==&lt;br /&gt;
This file contains the relevant functions from the Selenium library that is used for web driving.&lt;br /&gt;
&lt;br /&gt;
=Constructing Your Query=&lt;br /&gt;
&lt;br /&gt;
Using Recruiter to search generic terms such as &amp;quot;CompanyName Founder&amp;quot; does not turn up valuable search results. For optimal performance, it is recommended that you determine through another source the exact person you are looking for. Methods to get such information will be listed below.&lt;br /&gt;
&lt;br /&gt;
==format_founders.py==&lt;br /&gt;
Script location:&lt;br /&gt;
 TBD&lt;br /&gt;
&lt;br /&gt;
This python script takes a textfile of company names, and uses the Crunchbase Snapshot to determine the founder names of each company. If Crunchbase does not have the records of the founder, it is unlikely that a generic search on LinkedIn will provide any useful results. The script returns a new textfile with each company name replaced with &amp;quot;CompanyName Founder FounderName&amp;quot; for each founder of the company listed in the Crunchbase Snapshot. This new textfile can then be used directly with the LinkedIn Crawler to generate accurate search results, and retrieve accurate html pages.&lt;br /&gt;
&lt;br /&gt;
The following lists the functionality of functions in the format_founders.py script.&lt;br /&gt;
&lt;br /&gt;
===create_pickle()===&lt;br /&gt;
This function creates a pickled python dictionary of the Crunchbase Snapshot, people.csv. If a different dataset should be used in the future, one should pickle a dictionary in a similar fashion to this function, and then use that pickled result in the next function to reformat your queries.&lt;br /&gt;
&lt;br /&gt;
===reformat(pathname, output_filename)===&lt;br /&gt;
This function takes a textfile pathname and an output filename, and converts the textfile to a searchable term by using the data from the pickled Crunchbase Snapshot. The new textfile with the corrected queries are saved to the output filename.&lt;br /&gt;
&lt;br /&gt;
===Results with Accelerator Data===&lt;br /&gt;
Of the 265 recorded accelerators we have data on, 94 of them have founders listed through the Crunchbase Snapshot. Some of these companies will have multiple founders with profiles, and some of these founders will not have LinkedIn profiles.&lt;br /&gt;
&lt;br /&gt;
The final data is a text file with accelerator name, founder name, profile summary, experience, and education. It can be found at:&lt;br /&gt;
 E:\McNair\Projects\Accelerators\LinkedIn Founders Data&lt;br /&gt;
&lt;br /&gt;
=Fall 2017=&lt;br /&gt;
&lt;br /&gt;
==Accelerator Founders Search==&lt;br /&gt;
&lt;br /&gt;
'''These results are for the paper: The Jockey, The Horse, or the RaceTrack'''&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Our LinkedIn Recruiter Pro account has expired. Unfortunately, it turns out that profiles cannot be viewed through LinkedIn if the target profile is 3rd degree away or further. However, a Google search on such a LinkedIn profile will still let you view the profile, provided that an account has been logged into prior to the search. &lt;br /&gt;
&lt;br /&gt;
===Piggybacking Google===&lt;br /&gt;
&lt;br /&gt;
In order to get our data, we will piggyback on Google's web crawler to work around the LinkedIn protective wall. The crawler begins by logging into our test LinkedIn Account (credentials displayed at the top), and then launching a Google search for each query. By adding &amp;quot;LinkedIn&amp;quot; before the query, and &amp;quot;Founder&amp;quot; after the query, we can turn up relevant search results. The top 5 results on Google search are explored, scraped, and saved.&lt;br /&gt;
&lt;br /&gt;
We ended up not opting to use the Google method for various reasons.&lt;br /&gt;
&lt;br /&gt;
===Crunchbase API===&lt;br /&gt;
&lt;br /&gt;
Instead, we opted to use data from Crunchbase we have access to through a license. A wiki page on the crunchbase data and how to use the API can be found [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here]. The data can be accessed either through the web API (discussed on the Crunchbase Data wiki page), or through the bulk download we have in our SQL server.&lt;br /&gt;
&lt;br /&gt;
The web API has the nice added feature of having a '''Founders''' section. The API returns a JSON when a GET request is submitted using the correct company identifier. The Founders section of this JSON contains information on the Founders of the accelerator if Crunchbase has said data. Details about the data can be found on the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data Crunchbase Data Page]. &lt;br /&gt;
&lt;br /&gt;
The script that queried the API is called '''crunchbase_founders.py''' and can be found:&lt;br /&gt;
 E:\McNair\Projects\Accelerators\crunchbase_founders.py&lt;br /&gt;
&lt;br /&gt;
The resulting text file, called '''founders_linkedin.txt''', containing names and linkedin URLs of founders after messing around with the database can be found:&lt;br /&gt;
 E:\McNair\Projects\Accelerators\founders_linkedin.txt&lt;br /&gt;
&lt;br /&gt;
===Crawling LinkedIn===&lt;br /&gt;
&lt;br /&gt;
The next step of the process uses this data to get information about these founders from their LinkedIn profiles. For the founders we have linkedin URLs for, we will use those. For those we do not have linkedin URLs for, we will do a simple LinkedIn search with their name and accelerator name. The code for this crawler, '''linkedin_founders.py''' can be found:&lt;br /&gt;
 E:\McNair\Projects\LinkedIn Crawler\LinkedIn_Crawler\linkedin\linkedin_founders.py&lt;br /&gt;
&lt;br /&gt;
NOTE: Right now, this code needs to run in a virtual environment that contains Python3. This is due to the origins of the project, and this needs to be addressed when we have a lull in the development process. The only virtual environment we have managed to get working is on the Ubuntu machine sitting in the corner of the room. &lt;br /&gt;
&lt;br /&gt;
===Using the Ubuntu Virtual Environment===&lt;br /&gt;
&lt;br /&gt;
Step 1: Login using the researcher credentials. If you don't know what these are, ask someone.&lt;br /&gt;
&lt;br /&gt;
Step 2: Open the command prompt. Type:&lt;br /&gt;
 source dev/python3_venv_linkedin/bin/activate&lt;br /&gt;
&lt;br /&gt;
Your screen should now have (python3_venv_linkedin) next to any command you write. The virtual enivornment has been activated.&lt;br /&gt;
&lt;br /&gt;
Step 3: Change directories to: &lt;br /&gt;
  ~/dev/web_crawler/linkedin&lt;br /&gt;
&lt;br /&gt;
Step 4: All the files for any sort of LinkedIn Crawler are here. The file for this project is:&lt;br /&gt;
 linkedin_founders.py&lt;br /&gt;
&lt;br /&gt;
This file executes the crawler on all of the information stored in the file founders_linkedin.txt. Any file with the format company-tab-first name-tab-last name-tab-linkedin url-newline- will work.&lt;br /&gt;
The output of the data will be stored in founders_linkedin_main.txt, founders_linkedin_experience.txt, and founders_linkedin_education.txt.&lt;br /&gt;
&lt;br /&gt;
Step 5: To run the file, enter:&lt;br /&gt;
 python linkedin_founders.py&lt;br /&gt;
&lt;br /&gt;
The crawler will begin running automatically.&lt;br /&gt;
&lt;br /&gt;
Step 6: If you want to leave the virtual environment and return to the normal environment, simply enter the following in the command prompt:&lt;br /&gt;
 deactivate&lt;br /&gt;
&lt;br /&gt;
==LinkedIn Crawler on the RDP==&lt;br /&gt;
As of 12/18/2017, the linkedin crawler has been updated to be compatible with the RDP. Some of the bells and whistles have been removed from the ubuntu version due to download failures related to a missing vcvarsall.bat. &lt;br /&gt;
&lt;br /&gt;
Relevant files are located: &lt;br /&gt;
 E:\McNair\Projects\LinkedIn Crawler\LinkedIn_Crawler\linkedin&lt;br /&gt;
&lt;br /&gt;
===Crawling Google for unknown LinkedIn accounts===&lt;br /&gt;
For accelerator founders without a recorded LinkedIn profile, a quick google search will most likely get the correct page if the person has a LinkedIn profile. The script to run this process is in the same folder, and is called:&lt;br /&gt;
 goog_linkedin_founders.py&lt;br /&gt;
This file uses the same formatted text file for its queries.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Previous Posts about the LinkedIn Crawler=&lt;br /&gt;
== To what extent are we able to reproduce the network structure in LinkedIn (From Previous) == &lt;br /&gt;
&lt;br /&gt;
Example 1: 1st degree contact- You are connected to his profile&lt;br /&gt;
Albert Nabiullin (485 connections)&lt;br /&gt;
&lt;br /&gt;
Example 2: 2nd degree contact- You are connected to someone who is connected to him&lt;br /&gt;
Amir Kazempour Esmati (63 connections)&lt;br /&gt;
&lt;br /&gt;
Example 3: 3rd degree contact- You are connected to someone who is connected to someone else who is connected to her. &lt;br /&gt;
Linda Szabados(500+ connections) &lt;br /&gt;
&lt;br /&gt;
Any profile with a distance greater than three is defined as out your network. &lt;br /&gt;
&lt;br /&gt;
Summary: Individual specific network information are not accessible even for the first degree connections. Therefore, any such plans to construct a network structure based on the connection of every individuals is not feasible. &lt;br /&gt;
&lt;br /&gt;
It seems that the only possible direction would be using the advanced search feature.&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22369</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22369"/>
		<updated>2017-12-19T19:24:55Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: /* Fall 2017 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-19: Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to make linked in crawler compatible with the RDP. Removed some of the bells and whistles.&lt;br /&gt;
&lt;br /&gt;
2017-12-18: Continued finding errors with the Demo Day Crawler analysis. Rewrote the parser to remove any search terms that were in the top 10000 most common English words according to Google. Finished uploading and submitting Moroccan data.&lt;br /&gt;
&lt;br /&gt;
2017-12-15: Found errors with the Demo Day Crawler. Fixed scripts to download Moroccan Law Data.&lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt;&lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22365</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22365"/>
		<updated>2017-12-15T20:17:40Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
-----------------&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of potential keywords (used for throwing out irrelevant results):&lt;br /&gt;
 Keywords.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;br /&gt;
&lt;br /&gt;
==Faulty Results==&lt;br /&gt;
The first pass through the data revealed articles that had thousands of hits for keyword matches. This seemed highly suspicious, so we dug in deeper to investigate the cause of this issue. &lt;br /&gt;
&lt;br /&gt;
The following script in the same directory analyzes the keyword matches to determine the words with the highest number of hits.&lt;br /&gt;
 DemoDayAnalysis.py&lt;br /&gt;
&lt;br /&gt;
After investigation, it was found that many company names were taken after common english words. Here are some of the companies causing issues along with their associated accelerator:&lt;br /&gt;
&lt;br /&gt;
the, L-Spark&lt;br /&gt;
&lt;br /&gt;
Matter, This., [https://matter.vc/portfolio/this/ website]&lt;br /&gt;
&lt;br /&gt;
Fledge, HERE, [http://fledge.co/fledgling/here/ website]&lt;br /&gt;
&lt;br /&gt;
StartupBootCamp, We...&lt;br /&gt;
&lt;br /&gt;
LightBank Start, Zero&lt;br /&gt;
&lt;br /&gt;
Entrepreneurs Roundtable Accelerator,  SELECT&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Her&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Final&lt;br /&gt;
&lt;br /&gt;
AngelCube, class&lt;br /&gt;
&lt;br /&gt;
Matter, common&lt;br /&gt;
&lt;br /&gt;
L-Spark, Company&lt;br /&gt;
&lt;br /&gt;
Techstars, Hot&lt;br /&gt;
&lt;br /&gt;
After removing these companies from consideration as keywords,&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22364</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22364"/>
		<updated>2017-12-15T20:12:48Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
-----------------&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of potential keywords (used for throwing out irrelevant results):&lt;br /&gt;
 Keywords.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;br /&gt;
&lt;br /&gt;
==Faulty Results==&lt;br /&gt;
The first pass through the data revealed articles that had thousands of hits for keyword matches. This seemed highly suspicious, so we dug in deeper to investigate the cause of this issue. &lt;br /&gt;
&lt;br /&gt;
The following script in the same directory analyzes the keyword matches to determine the words with the highest number of hits.&lt;br /&gt;
 DemoDayAnalysis.py&lt;br /&gt;
&lt;br /&gt;
After investigation, it was found that many company names were taken after common english words. Here are some of the companies causing issues along with their associated accelerator:&lt;br /&gt;
&lt;br /&gt;
the, L-Spark&lt;br /&gt;
&lt;br /&gt;
Matter, This., [https://matter.vc/portfolio/this/ website]&lt;br /&gt;
&lt;br /&gt;
Fledge, HERE, [http://fledge.co/fledgling/here/ website]&lt;br /&gt;
&lt;br /&gt;
StartupBootCamp, We...&lt;br /&gt;
&lt;br /&gt;
LightBank Start, Zero&lt;br /&gt;
&lt;br /&gt;
Entrepreneurs Roundtable Accelerator,  SELECT&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Her&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Final&lt;br /&gt;
&lt;br /&gt;
AngelCube, class&lt;br /&gt;
&lt;br /&gt;
Matter, common&lt;br /&gt;
&lt;br /&gt;
L-Spark, Company&lt;br /&gt;
&lt;br /&gt;
After removing these companies from consideration as keywords,&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22363</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22363"/>
		<updated>2017-12-15T20:11:23Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
-----------------&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of potential keywords (used for throwing out irrelevant results):&lt;br /&gt;
 Keywords.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;br /&gt;
&lt;br /&gt;
==Faulty Results==&lt;br /&gt;
The first pass through the data revealed articles that had thousands of hits for keyword matches. This seemed highly suspicious, so we dug in deeper to investigate the cause of this issue. &lt;br /&gt;
&lt;br /&gt;
The following script in the same directory analyzes the keyword matches to determine the words with the highest number of hits.&lt;br /&gt;
 DemoDayAnalysis.py&lt;br /&gt;
&lt;br /&gt;
After investigation, it was found that many company names were taken after common english words. Here are some of the companies causing issues along with their associated accelerator:&lt;br /&gt;
&lt;br /&gt;
the, L-Spark&lt;br /&gt;
&lt;br /&gt;
Matter, This.&lt;br /&gt;
&lt;br /&gt;
Fledge, HERE&lt;br /&gt;
&lt;br /&gt;
StartupBootCamp, We...&lt;br /&gt;
&lt;br /&gt;
LightBank Start, Zero&lt;br /&gt;
&lt;br /&gt;
Entrepreneurs Roundtable Accelerator,  SELECT&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Her&lt;br /&gt;
&lt;br /&gt;
Y Combinator, Final&lt;br /&gt;
&lt;br /&gt;
AngelCube, class&lt;br /&gt;
&lt;br /&gt;
Matter, common&lt;br /&gt;
&lt;br /&gt;
L-Spark, Company&lt;br /&gt;
&lt;br /&gt;
After removing these companies from consideration as keywords,&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22362</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22362"/>
		<updated>2017-12-15T20:03:47Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-15: Found the error! Ex: L-Spark the, Matter This., Fledge HERE, StartupBootCamp We..., LightBank Start Zero, Entrepreneurs Roundtable Accelerator SELECT, Y Combinator Her, Y Combinator Final, AngelCube class, Matter common, L-Spark Company&lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22361</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22361"/>
		<updated>2017-12-15T20:01:42Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-15: Found the error! Ex: L-Spark the, Matter This., Fledge HERE, StartupBootCamp We..., LightBank Start Zero, Entrepreneurs Roundtable Accelerator SELECT, Y Combinator Her, Y Combinator Final, AngelCube class, Matter common&lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22360</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22360"/>
		<updated>2017-12-15T19:57:45Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-15: Found the error! Ex: L-Spark the, Matter This., Fledge HERE, StartupBootCamp We..., LightBank Start Zero, Entrepreneurs Roundtable Accelerator SELECT, Y Combinator Her, Y Combinator Final, AngelCube class&lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22359</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22359"/>
		<updated>2017-12-15T18:00:00Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-15: Found the error! Ex: L-Spark the, Matter This.&lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22358</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22358"/>
		<updated>2017-12-14T15:58:55Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.&lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=22307</id>
		<title>Tiger Geocoder</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=22307"/>
		<updated>2017-12-06T21:59:49Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Tiger Geocoder&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has start date=Fall 2017&lt;br /&gt;
|Has keywords=Tiger, Geocoder, Database&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
|Has Image=Tiger.jpg&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
Note: The Address Ranges National Geodatabase is available from https://www.census.gov/geo/maps-data/data/tiger-geodatabases.html&lt;br /&gt;
&lt;br /&gt;
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:&lt;br /&gt;
&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation]&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation]&lt;br /&gt;
&lt;br /&gt;
The official documentation for using and installing the Tiger Geocoder can be found in the following.&lt;br /&gt;
&lt;br /&gt;
[https://postgis.net/docs/Extras.html#Tiger_Geocoder General Instructions]&lt;br /&gt;
[https://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension Installation Instructions]&lt;br /&gt;
[http://postgis.net/docs/Geocode.html Geocoder Documentation]&lt;br /&gt;
&lt;br /&gt;
==Location==&lt;br /&gt;
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called &amp;quot;coffeeshops&amp;quot; that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then, &lt;br /&gt;
&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
===Install and Nation Data===&lt;br /&gt;
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:&lt;br /&gt;
 --Add Extensions to database&lt;br /&gt;
 CREATE EXTENSION postgis;&lt;br /&gt;
 CREATE EXTENSION fuzzystrmatch;&lt;br /&gt;
 CREATE EXTENSION postgis_tiger_geocoder;&lt;br /&gt;
 CREATE EXTENSION address_standardizer;&lt;br /&gt;
&lt;br /&gt;
You can test that the installation worked by running the following query: &lt;br /&gt;
 SELECT na.address, na.streetname,na.streettypeabbrev, na.zip&lt;br /&gt;
 	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;&lt;br /&gt;
&lt;br /&gt;
This should return the following:&lt;br /&gt;
  address | streetname | streettypeabbrev |  zip&lt;br /&gt;
 ---------+------------+------------------+-------&lt;br /&gt;
 	   1 | Devonshire | Pl               | 02109&lt;br /&gt;
&lt;br /&gt;
Next, a new profile needs to be created by using the following command.&lt;br /&gt;
 INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, &lt;br /&gt;
 		   loader, environ_set_command, county_process_command)&lt;br /&gt;
 SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,&lt;br /&gt;
 	   loader, environ_set_command, county_process_command&lt;br /&gt;
   FROM tiger.loader_platform&lt;br /&gt;
   WHERE os = 'sh';&lt;br /&gt;
&lt;br /&gt;
The installation instructions also provide the following note:&lt;br /&gt;
&lt;br /&gt;
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.&lt;br /&gt;
&lt;br /&gt;
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.&lt;br /&gt;
&lt;br /&gt;
 UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';&lt;br /&gt;
&lt;br /&gt;
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}                            +&lt;br /&gt;
 &lt;br /&gt;
 TMPDIR=&amp;quot;${staging_fold}/temp/&amp;quot;                +&lt;br /&gt;
 UNZIPTOOL=unzip                               +&lt;br /&gt;
 WGETTOOL=&amp;quot;/usr/bin/wget&amp;quot;                      +&lt;br /&gt;
 export PGBIN=/usr/lib/postgresql/9.6/bin      +&lt;br /&gt;
 export PGPORT=5432                            +&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}&lt;br /&gt;
&lt;br /&gt;
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:&lt;br /&gt;
 /gisdata&lt;br /&gt;
&lt;br /&gt;
There needs to be a directory called &amp;quot;temp&amp;quot; in the gisdata directory. To make the script, use the following from the command line:&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Nation_Script('test')&amp;quot; -d databasename -tA &amp;gt; /gisdata/nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths. &lt;br /&gt;
&lt;br /&gt;
Change directories:&lt;br /&gt;
 cd /gisdata&lt;br /&gt;
&lt;br /&gt;
Edit the script using your favorite command line text editor. Specifically, edit the following fields.&lt;br /&gt;
 PGUSER=postgres&lt;br /&gt;
 PGPASSWORD=(Ask Anne for this password)!&lt;br /&gt;
Everything else remains the same.&lt;br /&gt;
&lt;br /&gt;
 Run the script by using: &lt;br /&gt;
 sh nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.&lt;br /&gt;
&lt;br /&gt;
===State Data===&lt;br /&gt;
&lt;br /&gt;
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Script(ARRAY['MA'], 'test')&amp;quot; -d geocoder -tA &amp;gt; /gisdata/ma_load.sh&lt;br /&gt;
&lt;br /&gt;
CURRENT PROGRESS:&lt;br /&gt;
The following states have been downloaded into the geocoder database.&lt;br /&gt;
 AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, IA, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI, MN&lt;br /&gt;
&lt;br /&gt;
===Current Errors===&lt;br /&gt;
The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error &lt;br /&gt;
 HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
Possible thoughts:&lt;br /&gt;
&lt;br /&gt;
Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.&lt;br /&gt;
&lt;br /&gt;
Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.&lt;br /&gt;
&lt;br /&gt;
[https://trac.osgeo.org/postgis/ticket/3699 This] is the only online forum I could find with others who have faced a similar issue.&lt;br /&gt;
&lt;br /&gt;
NOTE: Contacted the US Census Bureau and never got a response. However, the scripts magically started working again about a week later. Second however, the scripts stopped working again the next day.&lt;br /&gt;
&lt;br /&gt;
==Geocode Function==&lt;br /&gt;
&lt;br /&gt;
The official arguments for the function are the following:&lt;br /&gt;
 setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);&lt;br /&gt;
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.&lt;br /&gt;
&lt;br /&gt;
===Single Address===&lt;br /&gt;
&lt;br /&gt;
An example query for a single address is:&lt;br /&gt;
 SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,&lt;br /&gt;
     (addy).address As stno, (addy).streetname As street,&lt;br /&gt;
     (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip&lt;br /&gt;
     FROM geocode('75 State Street, Boston MA 02109', 1) As g;&lt;br /&gt;
&lt;br /&gt;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess.&lt;br /&gt;
&lt;br /&gt;
ST_X(g.geomout) -- This retrieves the longitude coordinate of the point.&lt;br /&gt;
&lt;br /&gt;
ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point.&lt;br /&gt;
&lt;br /&gt;
addy -- In general, addy is a normalized address resulting from the input address.&lt;br /&gt;
&lt;br /&gt;
(addy).address -- The number of the address (Ex: &amp;quot;75&amp;quot; Blabla rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streetname -- The name of the street (Ex: 75 &amp;quot;Blabla&amp;quot; rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla &amp;quot;rd&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
(addy).location -- The city location of the address.&lt;br /&gt;
&lt;br /&gt;
(addy).stateabbrev -- The state abbreviation.&lt;br /&gt;
&lt;br /&gt;
(addy).zip -- The zipcode of the address.&lt;br /&gt;
&lt;br /&gt;
The output of the query above would be:&lt;br /&gt;
  rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip&lt;br /&gt;
 --------+-------------------+----------------+------+--------+------+--------+----+-------&lt;br /&gt;
       0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109&lt;br /&gt;
&lt;br /&gt;
==Current Errors==&lt;br /&gt;
Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found [https://www.census.gov/geo/about/contact.html here].&lt;br /&gt;
&lt;br /&gt;
The email exchanges are recorded below.&lt;br /&gt;
&lt;br /&gt;
--------------------------------------&lt;br /&gt;
--------------------------------------&lt;br /&gt;
&lt;br /&gt;
I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data. &lt;br /&gt;
&lt;br /&gt;
When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening? &lt;br /&gt;
&lt;br /&gt;
The extension I'm trying to install is below:&lt;br /&gt;
&lt;br /&gt;
http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension&lt;br /&gt;
&lt;br /&gt;
When I run any of the scripts that require data from TIGER, I am receiving the following error:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--2017-11-06 14:15:29--  http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip&lt;br /&gt;
Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c&lt;br /&gt;
Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2017-11-06 14:15:29 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
Thank you for reaching out to us.  The postgis docs is not from the Geography Division, so I cannot comment on that.  A couple things that come to mind.&lt;br /&gt;
&lt;br /&gt;
We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles.&lt;br /&gt;
You may need to clean out your cookies, restart your browser, and then attempt to reinstall.&lt;br /&gt;
Were you able to download our Shapefiles successfully?&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
The Shapefiles are part of the files being blocked by the 403 Forbidden error. &lt;br /&gt;
&lt;br /&gt;
The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?&lt;br /&gt;
&lt;br /&gt;
Our IP address is 128.42.44.181. &lt;br /&gt;
&lt;br /&gt;
---------------------------------------&lt;br /&gt;
&lt;br /&gt;
I have forwarded your question on to our IT folks.  Since I work on the subject matter side, I am unable to answer your questions.  Once I hear back from them, I will forward their response to you.  Hopefully they will provide to you what you need in order to download our Shapefiles.  My apologies for your inconvenience.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert&amp;diff=22306</id>
		<title>Peter Jalbert</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert&amp;diff=22306"/>
		<updated>2017-12-06T21:42:06Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Staff&lt;br /&gt;
|position=Tech Team&lt;br /&gt;
|name=Peter Jalbert,&lt;br /&gt;
|user_image=peter_headshot.jpg&lt;br /&gt;
|degree=BA&lt;br /&gt;
|major=Computer Science; Music Performance&lt;br /&gt;
|class=2019,&lt;br /&gt;
|join_date=09/27/2016,&lt;br /&gt;
|skills=Python, Selenium, Javascript, Java, SQL,&lt;br /&gt;
|interests=Music, Movies, Travel,&lt;br /&gt;
|email=pwj1@rice.edu&lt;br /&gt;
|status=Active&lt;br /&gt;
}}&lt;br /&gt;
==Education==&lt;br /&gt;
Peter is currently a junior at Rice University, pursuing a double major in Computer Science and Music. Peter graduated Salutatorian from the High School for the Performing and Visual Arts in Houston, TX in 2014. &lt;br /&gt;
&lt;br /&gt;
==Contributing Projects==&lt;br /&gt;
*[http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]&lt;br /&gt;
*[http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Crawler]&lt;br /&gt;
*[http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder#State_Data Tiger Geocoder]&lt;br /&gt;
*[http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data Crunchbase Data]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Urban_Start-up_Agglomeration Urban Start-up Agglomeration]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top Cities for VC Backed Companies]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Parliament Web Crawler]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Policy Report]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List]&lt;br /&gt;
* [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]&lt;br /&gt;
&lt;br /&gt;
==Looking for Code?==&lt;br /&gt;
&lt;br /&gt;
===Demo Day Crawler===&lt;br /&gt;
Term: Fall 2017&lt;br /&gt;
&lt;br /&gt;
Crawls Google search to find candidate web pages for accelerator companies' demo days.&lt;br /&gt;
 E:\McNair\Software\Accelerators\DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
===Demo Day Hits===&lt;br /&gt;
Term: Fall 2017&lt;br /&gt;
&lt;br /&gt;
Analyzes the results of a demo day crawl for hits of keywords.&lt;br /&gt;
 E:\McNair\Software\Accelerators\DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
===HTML to Text===&lt;br /&gt;
Term: Fall 2017&lt;br /&gt;
&lt;br /&gt;
Converts a folder of HTML files to a folder of TXT files.&lt;br /&gt;
 E:\McNair\Software\Accelerators\htmlToText.py&lt;br /&gt;
&lt;br /&gt;
===Tiger Geocoder===&lt;br /&gt;
Term: Fall 2017&lt;br /&gt;
&lt;br /&gt;
Installed a psql extension that allows for internal geocoding of addresses.&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
===Yelp Crawler===&lt;br /&gt;
Term: Fall 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Crawls for data on restaurants and coffeeshops within the 610 Loop. Part of the Houston Innovation District Project.&lt;br /&gt;
 E:\McNair\Software\YelpCrawler\yelp_crawl.py&lt;br /&gt;
&lt;br /&gt;
===Accelerator Founders===&lt;br /&gt;
Term: Fall 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Uses the LinkedIn Crawler along with the Crunchbase founders data to retrieve information on accelerator founders.&lt;br /&gt;
 E:\McNair\Projects\LinkedIn Crawler\LinkedIn_Crawler\linkedin\linkedin_founders.py&lt;br /&gt;
&lt;br /&gt;
=== Crunchbase Founders ===&lt;br /&gt;
Term: Fall 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Queries the Crunchbase API to get names of accelerator founders.&lt;br /&gt;
 E:\McNair\Projects\Accelerators\crunchbase_founders.py&lt;br /&gt;
&lt;br /&gt;
===LinkedIn Crawler===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Crawls LinkedIn to obtain relevant information.&lt;br /&gt;
 E:\McNair\Projects\LinkedIn Crawler\web_crawler\linkedin\run_linkedin_recruiter.py&lt;br /&gt;
&lt;br /&gt;
===Draw Enclosing Circles===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Draws the outcome of the Enclosing Circle Algorithm on a particular city to a google map HTML output.&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Enclosing_Circle\draw_vc_circles.py&lt;br /&gt;
&lt;br /&gt;
===Enclosing Circle for VCs ===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Uses the Enclosing Circle algorithm to find concentrations of VCs.&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Enclosing_Circle\vc_circles.py&lt;br /&gt;
&lt;br /&gt;
=== Industry Classifier ===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Neural Net that predicts a companies industry classification.&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Code+Final_Data\ChristyCode\IndustryClassifier.py&lt;br /&gt;
&lt;br /&gt;
===WayBack Machine Parser===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Uses the WayBack Machine API to retrieve timestamps for URLs.&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Spring 2017\Code+Final_Data\wayback_machine.py&lt;br /&gt;
&lt;br /&gt;
===Accelerator Address Geolocation===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Used to find latitude and longitude points for all accelerators in the accelerator data files.&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Code+Final_Data\process_locations.py&lt;br /&gt;
&lt;br /&gt;
===Accelerator Data Parser===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Used to parse the data for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List Accelerator Seed List Project].&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Code+Final_Data\parse_accelerator_data.py&lt;br /&gt;
&lt;br /&gt;
===Cohort Data Parser===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Used to parse cohort data for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List Accelerator Seed List Project].&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Code+Final_Data\parse_cohort_data.py&lt;br /&gt;
&lt;br /&gt;
===Google SiteSearch===&lt;br /&gt;
Term: Spring 2017&lt;br /&gt;
&lt;br /&gt;
Usage: Preliminary stage project intended to find an accurate web site for an unlisted company web address by using Google Search.&lt;br /&gt;
&lt;br /&gt;
 E:\McNair\Projects\Accelerators\Google_SiteSearch\sitesearch.py&lt;br /&gt;
&lt;br /&gt;
===F6S Crawler===&lt;br /&gt;
Term: Fall 2016&lt;br /&gt;
&lt;br /&gt;
Usage: Used to download html files containing accelerator information from the F6S website.&lt;br /&gt;
&lt;br /&gt;
 E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\F6S_Crawler\f6s_crawler_gentle.py&lt;br /&gt;
&lt;br /&gt;
===F6S Parser===&lt;br /&gt;
Term: Fall 2016&lt;br /&gt;
&lt;br /&gt;
Usage: Used to parse the html files downloaded by the F6S crawler to create a list of accelerators.&lt;br /&gt;
&lt;br /&gt;
 E:\McNair\Projects\Accelerators\F6S Accelerator HTMLs\F6S_Crawler\f6s_parser.py&lt;br /&gt;
&lt;br /&gt;
===Executive Order Crawler===&lt;br /&gt;
Term: Fall 2016&lt;br /&gt;
&lt;br /&gt;
Usage: Used to download executive orders. NOTE: uses scrapy format, run differently from regular python programs.&lt;br /&gt;
&lt;br /&gt;
 E:\McNair\Projects\Executive_order_crawler\executive&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Kuwait Web Driver===&lt;br /&gt;
Term: Fall 2016&lt;br /&gt;
&lt;br /&gt;
Usage: Used to download csvs of bills and questions from the Kuwait Government Website. Uses Selenium. All scripts in the folder do similar things.&lt;br /&gt;
&lt;br /&gt;
 E:\McNair\Projects\Middle East Studies Web Drivers\Kuwait&lt;br /&gt;
&lt;br /&gt;
===Moroccan Web Driver===&lt;br /&gt;
Term: Fall 2016&lt;br /&gt;
&lt;br /&gt;
Usage: Used to download pdfs of bills and questions from the Moroccan Government Website. Uses Selenium. All scripts in the folder do similar things.&lt;br /&gt;
&lt;br /&gt;
 E:\McNair\Projects\Middle East Studies Web Drivers\Morocco\Moroccan Bills&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Time at McNair==&lt;br /&gt;
[[Peter Jalbert (Work Log)]]&lt;br /&gt;
[[Category:McNair Staff]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22305</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22305"/>
		<updated>2017-12-06T21:37:16Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=22304</id>
		<title>Tiger Geocoder</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=22304"/>
		<updated>2017-12-06T21:36:07Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Tiger Geocoder&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has start date=Fall 2017&lt;br /&gt;
|Has keywords=Tiger, Geocoder, Database&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
|Has Image=Tiger.jpg&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
Note: The Address Ranges National Geodatabase is available from https://www.census.gov/geo/maps-data/data/tiger-geodatabases.html&lt;br /&gt;
&lt;br /&gt;
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:&lt;br /&gt;
&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation]&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation]&lt;br /&gt;
&lt;br /&gt;
The official documentation for using and installing the Tiger Geocoder can be found in the following.&lt;br /&gt;
&lt;br /&gt;
[https://postgis.net/docs/Extras.html#Tiger_Geocoder General Instructions]&lt;br /&gt;
[https://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension Installation Instructions]&lt;br /&gt;
[http://postgis.net/docs/Geocode.html Geocoder Documentation]&lt;br /&gt;
&lt;br /&gt;
==Location==&lt;br /&gt;
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called &amp;quot;coffeeshops&amp;quot; that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then, &lt;br /&gt;
&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
===Install and Nation Data===&lt;br /&gt;
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:&lt;br /&gt;
 --Add Extensions to database&lt;br /&gt;
 CREATE EXTENSION postgis;&lt;br /&gt;
 CREATE EXTENSION fuzzystrmatch;&lt;br /&gt;
 CREATE EXTENSION postgis_tiger_geocoder;&lt;br /&gt;
 CREATE EXTENSION address_standardizer;&lt;br /&gt;
&lt;br /&gt;
You can test that the installation worked by running the following query: &lt;br /&gt;
 SELECT na.address, na.streetname,na.streettypeabbrev, na.zip&lt;br /&gt;
 	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;&lt;br /&gt;
&lt;br /&gt;
This should return the following:&lt;br /&gt;
  address | streetname | streettypeabbrev |  zip&lt;br /&gt;
 ---------+------------+------------------+-------&lt;br /&gt;
 	   1 | Devonshire | Pl               | 02109&lt;br /&gt;
&lt;br /&gt;
Next, a new profile needs to be created by using the following command.&lt;br /&gt;
 INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, &lt;br /&gt;
 		   loader, environ_set_command, county_process_command)&lt;br /&gt;
 SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,&lt;br /&gt;
 	   loader, environ_set_command, county_process_command&lt;br /&gt;
   FROM tiger.loader_platform&lt;br /&gt;
   WHERE os = 'sh';&lt;br /&gt;
&lt;br /&gt;
The installation instructions also provide the following note:&lt;br /&gt;
&lt;br /&gt;
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.&lt;br /&gt;
&lt;br /&gt;
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.&lt;br /&gt;
&lt;br /&gt;
 UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';&lt;br /&gt;
&lt;br /&gt;
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}                            +&lt;br /&gt;
 &lt;br /&gt;
 TMPDIR=&amp;quot;${staging_fold}/temp/&amp;quot;                +&lt;br /&gt;
 UNZIPTOOL=unzip                               +&lt;br /&gt;
 WGETTOOL=&amp;quot;/usr/bin/wget&amp;quot;                      +&lt;br /&gt;
 export PGBIN=/usr/lib/postgresql/9.6/bin      +&lt;br /&gt;
 export PGPORT=5432                            +&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}&lt;br /&gt;
&lt;br /&gt;
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:&lt;br /&gt;
 /gisdata&lt;br /&gt;
&lt;br /&gt;
There needs to be a directory called &amp;quot;temp&amp;quot; in the gisdata directory. To make the script, use the following from the command line:&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Nation_Script('test')&amp;quot; -d databasename -tA &amp;gt; /gisdata/nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths. &lt;br /&gt;
&lt;br /&gt;
Change directories:&lt;br /&gt;
 cd /gisdata&lt;br /&gt;
&lt;br /&gt;
Edit the script using your favorite command line text editor. Specifically, edit the following fields.&lt;br /&gt;
 PGUSER=postgres&lt;br /&gt;
 PGPASSWORD=(Ask Anne for this password)!&lt;br /&gt;
Everything else remains the same.&lt;br /&gt;
&lt;br /&gt;
 Run the script by using: &lt;br /&gt;
 sh nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.&lt;br /&gt;
&lt;br /&gt;
===State Data===&lt;br /&gt;
&lt;br /&gt;
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Script(ARRAY['MA'], 'test')&amp;quot; -d geocoder -tA &amp;gt; /gisdata/ma_load.sh&lt;br /&gt;
&lt;br /&gt;
CURRENT PROGRESS:&lt;br /&gt;
The following states have been downloaded into the geocoder database.&lt;br /&gt;
 AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, IA, ID, IL, IN, IA, KS, KY, LA, ME, MD, MA, MI&lt;br /&gt;
&lt;br /&gt;
===Current Errors===&lt;br /&gt;
The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error &lt;br /&gt;
 HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
Possible thoughts:&lt;br /&gt;
&lt;br /&gt;
Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.&lt;br /&gt;
&lt;br /&gt;
Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.&lt;br /&gt;
&lt;br /&gt;
[https://trac.osgeo.org/postgis/ticket/3699 This] is the only online forum I could find with others who have faced a similar issue.&lt;br /&gt;
&lt;br /&gt;
NOTE: Contacted the US Census Bureau and never got a response. However, the scripts magically started working again about a week later. Second however, the scripts stopped working again the next day.&lt;br /&gt;
&lt;br /&gt;
==Geocode Function==&lt;br /&gt;
&lt;br /&gt;
The official arguments for the function are the following:&lt;br /&gt;
 setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);&lt;br /&gt;
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.&lt;br /&gt;
&lt;br /&gt;
===Single Address===&lt;br /&gt;
&lt;br /&gt;
An example query for a single address is:&lt;br /&gt;
 SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,&lt;br /&gt;
     (addy).address As stno, (addy).streetname As street,&lt;br /&gt;
     (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip&lt;br /&gt;
     FROM geocode('75 State Street, Boston MA 02109', 1) As g;&lt;br /&gt;
&lt;br /&gt;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess.&lt;br /&gt;
&lt;br /&gt;
ST_X(g.geomout) -- This retrieves the longitude coordinate of the point.&lt;br /&gt;
&lt;br /&gt;
ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point.&lt;br /&gt;
&lt;br /&gt;
addy -- In general, addy is a normalized address resulting from the input address.&lt;br /&gt;
&lt;br /&gt;
(addy).address -- The number of the address (Ex: &amp;quot;75&amp;quot; Blabla rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streetname -- The name of the street (Ex: 75 &amp;quot;Blabla&amp;quot; rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla &amp;quot;rd&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
(addy).location -- The city location of the address.&lt;br /&gt;
&lt;br /&gt;
(addy).stateabbrev -- The state abbreviation.&lt;br /&gt;
&lt;br /&gt;
(addy).zip -- The zipcode of the address.&lt;br /&gt;
&lt;br /&gt;
The output of the query above would be:&lt;br /&gt;
  rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip&lt;br /&gt;
 --------+-------------------+----------------+------+--------+------+--------+----+-------&lt;br /&gt;
       0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109&lt;br /&gt;
&lt;br /&gt;
==Current Errors==&lt;br /&gt;
Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found [https://www.census.gov/geo/about/contact.html here].&lt;br /&gt;
&lt;br /&gt;
The email exchanges are recorded below.&lt;br /&gt;
&lt;br /&gt;
--------------------------------------&lt;br /&gt;
--------------------------------------&lt;br /&gt;
&lt;br /&gt;
I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data. &lt;br /&gt;
&lt;br /&gt;
When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening? &lt;br /&gt;
&lt;br /&gt;
The extension I'm trying to install is below:&lt;br /&gt;
&lt;br /&gt;
http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension&lt;br /&gt;
&lt;br /&gt;
When I run any of the scripts that require data from TIGER, I am receiving the following error:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--2017-11-06 14:15:29--  http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip&lt;br /&gt;
Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c&lt;br /&gt;
Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2017-11-06 14:15:29 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
Thank you for reaching out to us.  The postgis docs is not from the Geography Division, so I cannot comment on that.  A couple things that come to mind.&lt;br /&gt;
&lt;br /&gt;
We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles.&lt;br /&gt;
You may need to clean out your cookies, restart your browser, and then attempt to reinstall.&lt;br /&gt;
Were you able to download our Shapefiles successfully?&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
The Shapefiles are part of the files being blocked by the 403 Forbidden error. &lt;br /&gt;
&lt;br /&gt;
The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?&lt;br /&gt;
&lt;br /&gt;
Our IP address is 128.42.44.181. &lt;br /&gt;
&lt;br /&gt;
---------------------------------------&lt;br /&gt;
&lt;br /&gt;
I have forwarded your question on to our IT folks.  Since I work on the subject matter side, I am unable to answer your questions.  Once I hear back from them, I will forward their response to you.  Hopefully they will provide to you what you need in order to download our Shapefiles.  My apologies for your inconvenience.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22131</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22131"/>
		<updated>2017-11-28T21:49:52Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
-----------------&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of potential keywords (used for throwing out irrelevant results):&lt;br /&gt;
 Keywords.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22130</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22130"/>
		<updated>2017-11-28T21:24:21Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
============&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of potential keywords (used for throwing out irrelevant results):&lt;br /&gt;
 Keywords.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;br /&gt;
&lt;br /&gt;
A file with the name of the results that passed keyword matching:&lt;br /&gt;
 DemoDayHitsFull.txt&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22129</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=22129"/>
		<updated>2017-11-28T21:23:42Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
A script to determine the text files of webpages that have at least one hit of these key words can be found:&lt;br /&gt;
 DemoDayHits.py&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;br /&gt;
The code for utilizing Selenium to download HTML files can be found in the DemoDayCrawler.py file.&lt;br /&gt;
&lt;br /&gt;
The initial observation set over the data scraped 100 links for each of 20 sample accelerators from the list of overall accelerators. These sample pages were turned to text, and scored to remove web pages with no mention of relevant accelerators or companies.&lt;br /&gt;
&lt;br /&gt;
Once the process was tweaked in response to the initial sample testing, the process ran again over all accelerators. The test determined that we needed take no more than 10 links for each accelerator, and that 'Demo Day' was a suitable search term.&lt;br /&gt;
&lt;br /&gt;
COMPLETE FILES&lt;br /&gt;
============&lt;br /&gt;
These files hold data for all the accelerators: not just the test set.&lt;br /&gt;
&lt;br /&gt;
The full list of accelerators:&lt;br /&gt;
 ListOfAccs.txt&lt;br /&gt;
&lt;br /&gt;
The full list of potential keywords (used for throwing out irrelevant results):&lt;br /&gt;
 Keywords.txt&lt;br /&gt;
&lt;br /&gt;
A list of accelerators, queries, and urls:&lt;br /&gt;
 demoday_crawl_full.txt&lt;br /&gt;
&lt;br /&gt;
A directory with HTML files for all accelerator demo day results:&lt;br /&gt;
 DemoDayHTMLFull&lt;br /&gt;
&lt;br /&gt;
A directory with TXT files for all accelerator demo day results:&lt;br /&gt;
 DemoDayTxtFull&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22128</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22128"/>
		<updated>2017-11-28T20:55:12Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.&lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=22079</id>
		<title>Tiger Geocoder</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=22079"/>
		<updated>2017-11-27T22:31:45Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Tiger Geocoder&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has start date=Fall 2017&lt;br /&gt;
|Has keywords=Tiger, Geocoder, Database&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
|Has Image=Tiger.jpg&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
Note: The Address Ranges National Geodatabase is available from https://www.census.gov/geo/maps-data/data/tiger-geodatabases.html&lt;br /&gt;
&lt;br /&gt;
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:&lt;br /&gt;
&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation]&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation]&lt;br /&gt;
&lt;br /&gt;
The official documentation for using and installing the Tiger Geocoder can be found in the following.&lt;br /&gt;
&lt;br /&gt;
[https://postgis.net/docs/Extras.html#Tiger_Geocoder General Instructions]&lt;br /&gt;
[https://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension Installation Instructions]&lt;br /&gt;
[http://postgis.net/docs/Geocode.html Geocoder Documentation]&lt;br /&gt;
&lt;br /&gt;
==Location==&lt;br /&gt;
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called &amp;quot;coffeeshops&amp;quot; that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then, &lt;br /&gt;
&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
===Install and Nation Data===&lt;br /&gt;
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:&lt;br /&gt;
 --Add Extensions to database&lt;br /&gt;
 CREATE EXTENSION postgis;&lt;br /&gt;
 CREATE EXTENSION fuzzystrmatch;&lt;br /&gt;
 CREATE EXTENSION postgis_tiger_geocoder;&lt;br /&gt;
 CREATE EXTENSION address_standardizer;&lt;br /&gt;
&lt;br /&gt;
You can test that the installation worked by running the following query: &lt;br /&gt;
 SELECT na.address, na.streetname,na.streettypeabbrev, na.zip&lt;br /&gt;
 	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;&lt;br /&gt;
&lt;br /&gt;
This should return the following:&lt;br /&gt;
  address | streetname | streettypeabbrev |  zip&lt;br /&gt;
 ---------+------------+------------------+-------&lt;br /&gt;
 	   1 | Devonshire | Pl               | 02109&lt;br /&gt;
&lt;br /&gt;
Next, a new profile needs to be created by using the following command.&lt;br /&gt;
 INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, &lt;br /&gt;
 		   loader, environ_set_command, county_process_command)&lt;br /&gt;
 SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,&lt;br /&gt;
 	   loader, environ_set_command, county_process_command&lt;br /&gt;
   FROM tiger.loader_platform&lt;br /&gt;
   WHERE os = 'sh';&lt;br /&gt;
&lt;br /&gt;
The installation instructions also provide the following note:&lt;br /&gt;
&lt;br /&gt;
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.&lt;br /&gt;
&lt;br /&gt;
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.&lt;br /&gt;
&lt;br /&gt;
 UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';&lt;br /&gt;
&lt;br /&gt;
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}                            +&lt;br /&gt;
 &lt;br /&gt;
 TMPDIR=&amp;quot;${staging_fold}/temp/&amp;quot;                +&lt;br /&gt;
 UNZIPTOOL=unzip                               +&lt;br /&gt;
 WGETTOOL=&amp;quot;/usr/bin/wget&amp;quot;                      +&lt;br /&gt;
 export PGBIN=/usr/lib/postgresql/9.6/bin      +&lt;br /&gt;
 export PGPORT=5432                            +&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}&lt;br /&gt;
&lt;br /&gt;
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:&lt;br /&gt;
 /gisdata&lt;br /&gt;
&lt;br /&gt;
There needs to be a directory called &amp;quot;temp&amp;quot; in the gisdata directory. To make the script, use the following from the command line:&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Nation_Script('test')&amp;quot; -d databasename -tA &amp;gt; /gisdata/nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths. &lt;br /&gt;
&lt;br /&gt;
Change directories:&lt;br /&gt;
 cd /gisdata&lt;br /&gt;
&lt;br /&gt;
Edit the script using your favorite command line text editor. Specifically, edit the following fields.&lt;br /&gt;
 PGUSER=postgres&lt;br /&gt;
 PGPASSWORD=(Ask Anne for this password)!&lt;br /&gt;
Everything else remains the same.&lt;br /&gt;
&lt;br /&gt;
 Run the script by using: &lt;br /&gt;
 sh nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.&lt;br /&gt;
&lt;br /&gt;
===State Data===&lt;br /&gt;
&lt;br /&gt;
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Script(ARRAY['MA'], 'test')&amp;quot; -d geocoder -tA &amp;gt; /gisdata/ma_load.sh&lt;br /&gt;
&lt;br /&gt;
CURRENT PROGRESS:&lt;br /&gt;
The following states have been downloaded into the geocoder database.&lt;br /&gt;
 AL, AK, AZ, AR, CA, CO, CT, DE, FL, GA, IA, ID, IL, MA&lt;br /&gt;
&lt;br /&gt;
===Current Errors===&lt;br /&gt;
The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error &lt;br /&gt;
 HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
Possible thoughts:&lt;br /&gt;
&lt;br /&gt;
Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.&lt;br /&gt;
&lt;br /&gt;
Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.&lt;br /&gt;
&lt;br /&gt;
[https://trac.osgeo.org/postgis/ticket/3699 This] is the only online forum I could find with others who have faced a similar issue.&lt;br /&gt;
&lt;br /&gt;
NOTE: Contacted the US Census Bureau and never got a response. However, the scripts magically started working again about a week later. Second however, the scripts stopped working again the next day.&lt;br /&gt;
&lt;br /&gt;
==Geocode Function==&lt;br /&gt;
&lt;br /&gt;
The official arguments for the function are the following:&lt;br /&gt;
 setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);&lt;br /&gt;
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.&lt;br /&gt;
&lt;br /&gt;
===Single Address===&lt;br /&gt;
&lt;br /&gt;
An example query for a single address is:&lt;br /&gt;
 SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,&lt;br /&gt;
     (addy).address As stno, (addy).streetname As street,&lt;br /&gt;
     (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip&lt;br /&gt;
     FROM geocode('75 State Street, Boston MA 02109', 1) As g;&lt;br /&gt;
&lt;br /&gt;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess.&lt;br /&gt;
&lt;br /&gt;
ST_X(g.geomout) -- This retrieves the longitude coordinate of the point.&lt;br /&gt;
&lt;br /&gt;
ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point.&lt;br /&gt;
&lt;br /&gt;
addy -- In general, addy is a normalized address resulting from the input address.&lt;br /&gt;
&lt;br /&gt;
(addy).address -- The number of the address (Ex: &amp;quot;75&amp;quot; Blabla rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streetname -- The name of the street (Ex: 75 &amp;quot;Blabla&amp;quot; rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla &amp;quot;rd&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
(addy).location -- The city location of the address.&lt;br /&gt;
&lt;br /&gt;
(addy).stateabbrev -- The state abbreviation.&lt;br /&gt;
&lt;br /&gt;
(addy).zip -- The zipcode of the address.&lt;br /&gt;
&lt;br /&gt;
The output of the query above would be:&lt;br /&gt;
  rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip&lt;br /&gt;
 --------+-------------------+----------------+------+--------+------+--------+----+-------&lt;br /&gt;
       0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109&lt;br /&gt;
&lt;br /&gt;
==Current Errors==&lt;br /&gt;
Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found [https://www.census.gov/geo/about/contact.html here].&lt;br /&gt;
&lt;br /&gt;
The email exchanges are recorded below.&lt;br /&gt;
&lt;br /&gt;
--------------------------------------&lt;br /&gt;
--------------------------------------&lt;br /&gt;
&lt;br /&gt;
I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data. &lt;br /&gt;
&lt;br /&gt;
When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening? &lt;br /&gt;
&lt;br /&gt;
The extension I'm trying to install is below:&lt;br /&gt;
&lt;br /&gt;
http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension&lt;br /&gt;
&lt;br /&gt;
When I run any of the scripts that require data from TIGER, I am receiving the following error:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--2017-11-06 14:15:29--  http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip&lt;br /&gt;
Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c&lt;br /&gt;
Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2017-11-06 14:15:29 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
Thank you for reaching out to us.  The postgis docs is not from the Geography Division, so I cannot comment on that.  A couple things that come to mind.&lt;br /&gt;
&lt;br /&gt;
We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles.&lt;br /&gt;
You may need to clean out your cookies, restart your browser, and then attempt to reinstall.&lt;br /&gt;
Were you able to download our Shapefiles successfully?&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
The Shapefiles are part of the files being blocked by the 403 Forbidden error. &lt;br /&gt;
&lt;br /&gt;
The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?&lt;br /&gt;
&lt;br /&gt;
Our IP address is 128.42.44.181. &lt;br /&gt;
&lt;br /&gt;
---------------------------------------&lt;br /&gt;
&lt;br /&gt;
I have forwarded your question on to our IT folks.  Since I work on the subject matter side, I am unable to answer your questions.  Once I hear back from them, I will forward their response to you.  Hopefully they will provide to you what you need in order to download our Shapefiles.  My apologies for your inconvenience.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22078</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22078"/>
		<updated>2017-11-27T21:26:32Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.&lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22077</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=22077"/>
		<updated>2017-11-27T21:16:14Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. &lt;br /&gt;
&lt;br /&gt;
2017-11-20:  Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21996</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21996"/>
		<updated>2017-11-20T23:00:51Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-17: &lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21990</id>
		<title>Tiger Geocoder</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21990"/>
		<updated>2017-11-20T21:52:51Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Tiger Geocoder&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has start date=Fall 2017&lt;br /&gt;
|Has keywords=Tiger, Geocoder, Database&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
|Has Image=Tiger.jpg&lt;br /&gt;
}}&lt;br /&gt;
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:&lt;br /&gt;
&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation]&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation]&lt;br /&gt;
&lt;br /&gt;
The official documentation for using and installing the Tiger Geocoder can be found in the following.&lt;br /&gt;
&lt;br /&gt;
[https://postgis.net/docs/Extras.html#Tiger_Geocoder General Instructions]&lt;br /&gt;
[https://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension Installation Instructions]&lt;br /&gt;
[http://postgis.net/docs/Geocode.html Geocoder Documentation]&lt;br /&gt;
&lt;br /&gt;
==Location==&lt;br /&gt;
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called &amp;quot;coffeeshops&amp;quot; that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then, &lt;br /&gt;
&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
===Install and Nation Data===&lt;br /&gt;
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:&lt;br /&gt;
 --Add Extensions to database&lt;br /&gt;
 CREATE EXTENSION postgis;&lt;br /&gt;
 CREATE EXTENSION fuzzystrmatch;&lt;br /&gt;
 CREATE EXTENSION postgis_tiger_geocoder;&lt;br /&gt;
 CREATE EXTENSION address_standardizer;&lt;br /&gt;
&lt;br /&gt;
You can test that the installation worked by running the following query: &lt;br /&gt;
 SELECT na.address, na.streetname,na.streettypeabbrev, na.zip&lt;br /&gt;
 	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;&lt;br /&gt;
&lt;br /&gt;
This should return the following:&lt;br /&gt;
  address | streetname | streettypeabbrev |  zip&lt;br /&gt;
 ---------+------------+------------------+-------&lt;br /&gt;
 	   1 | Devonshire | Pl               | 02109&lt;br /&gt;
&lt;br /&gt;
Next, a new profile needs to be created by using the following command.&lt;br /&gt;
 INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, &lt;br /&gt;
 		   loader, environ_set_command, county_process_command)&lt;br /&gt;
 SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,&lt;br /&gt;
 	   loader, environ_set_command, county_process_command&lt;br /&gt;
   FROM tiger.loader_platform&lt;br /&gt;
   WHERE os = 'sh';&lt;br /&gt;
&lt;br /&gt;
The installation instructions also provide the following note:&lt;br /&gt;
&lt;br /&gt;
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.&lt;br /&gt;
&lt;br /&gt;
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.&lt;br /&gt;
&lt;br /&gt;
 UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';&lt;br /&gt;
&lt;br /&gt;
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}                            +&lt;br /&gt;
 &lt;br /&gt;
 TMPDIR=&amp;quot;${staging_fold}/temp/&amp;quot;                +&lt;br /&gt;
 UNZIPTOOL=unzip                               +&lt;br /&gt;
 WGETTOOL=&amp;quot;/usr/bin/wget&amp;quot;                      +&lt;br /&gt;
 export PGBIN=/usr/lib/postgresql/9.6/bin      +&lt;br /&gt;
 export PGPORT=5432                            +&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}&lt;br /&gt;
&lt;br /&gt;
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:&lt;br /&gt;
 /gisdata&lt;br /&gt;
&lt;br /&gt;
There needs to be a directory called &amp;quot;temp&amp;quot; in the gisdata directory. To make the script, use the following from the command line:&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Nation_Script('test')&amp;quot; -d databasename -tA &amp;gt; /gisdata/nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths. &lt;br /&gt;
&lt;br /&gt;
Change directories:&lt;br /&gt;
 cd /gisdata&lt;br /&gt;
&lt;br /&gt;
Edit the script using your favorite command line text editor. Specifically, edit the following fields.&lt;br /&gt;
 PGUSER=postgres&lt;br /&gt;
 PGPASSWORD=(Ask Anne for this password)!&lt;br /&gt;
Everything else remains the same.&lt;br /&gt;
&lt;br /&gt;
 Run the script by using: &lt;br /&gt;
 sh nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.&lt;br /&gt;
&lt;br /&gt;
===State Data===&lt;br /&gt;
&lt;br /&gt;
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Script(ARRAY['MA'], 'test')&amp;quot; -d geocoder -tA &amp;gt; /gisdata/ma_load.sh&lt;br /&gt;
&lt;br /&gt;
CURRENT PROGRESS:&lt;br /&gt;
The following states have been downloaded into the geocoder database.&lt;br /&gt;
 AL, AK, AZ, AR, CA, CO, CT, DE, FL, MA&lt;br /&gt;
&lt;br /&gt;
===Current Errors===&lt;br /&gt;
The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error &lt;br /&gt;
 HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
Possible thoughts:&lt;br /&gt;
&lt;br /&gt;
Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.&lt;br /&gt;
&lt;br /&gt;
Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.&lt;br /&gt;
&lt;br /&gt;
[https://trac.osgeo.org/postgis/ticket/3699 This] is the only online forum I could find with others who have faced a similar issue.&lt;br /&gt;
&lt;br /&gt;
NOTE: Contacted the US Census Bureau and never got a response. However, the scripts magically started working again about a week later. Second however, the scripts stopped working again the next day.&lt;br /&gt;
&lt;br /&gt;
==Geocode Function==&lt;br /&gt;
&lt;br /&gt;
The official arguments for the function are the following:&lt;br /&gt;
 setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);&lt;br /&gt;
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.&lt;br /&gt;
&lt;br /&gt;
===Single Address===&lt;br /&gt;
&lt;br /&gt;
An example query for a single address is:&lt;br /&gt;
 SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,&lt;br /&gt;
     (addy).address As stno, (addy).streetname As street,&lt;br /&gt;
     (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip&lt;br /&gt;
     FROM geocode('75 State Street, Boston MA 02109', 1) As g;&lt;br /&gt;
&lt;br /&gt;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess.&lt;br /&gt;
&lt;br /&gt;
ST_X(g.geomout) -- This retrieves the longitude coordinate of the point.&lt;br /&gt;
&lt;br /&gt;
ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point.&lt;br /&gt;
&lt;br /&gt;
addy -- In general, addy is a normalized address resulting from the input address.&lt;br /&gt;
&lt;br /&gt;
(addy).address -- The number of the address (Ex: &amp;quot;75&amp;quot; Blabla rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streetname -- The name of the street (Ex: 75 &amp;quot;Blabla&amp;quot; rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla &amp;quot;rd&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
(addy).location -- The city location of the address.&lt;br /&gt;
&lt;br /&gt;
(addy).stateabbrev -- The state abbreviation.&lt;br /&gt;
&lt;br /&gt;
(addy).zip -- The zipcode of the address.&lt;br /&gt;
&lt;br /&gt;
The output of the query above would be:&lt;br /&gt;
  rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip&lt;br /&gt;
 --------+-------------------+----------------+------+--------+------+--------+----+-------&lt;br /&gt;
       0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109&lt;br /&gt;
&lt;br /&gt;
==Current Errors==&lt;br /&gt;
Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found [https://www.census.gov/geo/about/contact.html here].&lt;br /&gt;
&lt;br /&gt;
The email exchanges are recorded below.&lt;br /&gt;
&lt;br /&gt;
--------------------------------------&lt;br /&gt;
--------------------------------------&lt;br /&gt;
&lt;br /&gt;
I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data. &lt;br /&gt;
&lt;br /&gt;
When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening? &lt;br /&gt;
&lt;br /&gt;
The extension I'm trying to install is below:&lt;br /&gt;
&lt;br /&gt;
http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension&lt;br /&gt;
&lt;br /&gt;
When I run any of the scripts that require data from TIGER, I am receiving the following error:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--2017-11-06 14:15:29--  http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip&lt;br /&gt;
Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c&lt;br /&gt;
Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2017-11-06 14:15:29 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
Thank you for reaching out to us.  The postgis docs is not from the Geography Division, so I cannot comment on that.  A couple things that come to mind.&lt;br /&gt;
&lt;br /&gt;
We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles.&lt;br /&gt;
You may need to clean out your cookies, restart your browser, and then attempt to reinstall.&lt;br /&gt;
Were you able to download our Shapefiles successfully?&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
The Shapefiles are part of the files being blocked by the 403 Forbidden error. &lt;br /&gt;
&lt;br /&gt;
The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?&lt;br /&gt;
&lt;br /&gt;
Our IP address is 128.42.44.181. &lt;br /&gt;
&lt;br /&gt;
---------------------------------------&lt;br /&gt;
&lt;br /&gt;
I have forwarded your question on to our IT folks.  Since I work on the subject matter side, I am unable to answer your questions.  Once I hear back from them, I will forward their response to you.  Hopefully they will provide to you what you need in order to download our Shapefiles.  My apologies for your inconvenience.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21957</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21957"/>
		<updated>2017-11-16T21:36:37Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-16: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=21916</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=21916"/>
		<updated>2017-11-15T21:34:32Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;br /&gt;
&lt;br /&gt;
==Downloading HTML Files with Selenium==&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=21915</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=21915"/>
		<updated>2017-11-15T21:27:00Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from the DemoDayHTML directory, and writes them to the DemoDayTxt directory:&lt;br /&gt;
 htmlToText.py&lt;br /&gt;
&lt;br /&gt;
A script to match Keywords (Accelerator and Cohort names) against the resulting text pages can be found in KeyTerms.py. The script takes the Keywords located in CohortAndAcceleratorsFullList.txt, and the text files in DemoDayTxt, and creates a file with the number of matches of each keyword against each text file.&lt;br /&gt;
&lt;br /&gt;
The script can be found:&lt;br /&gt;
 KeyTerms.py&lt;br /&gt;
&lt;br /&gt;
The Keyword matches text file can be found:&lt;br /&gt;
 DemoDayTxt\KeyTermFile\KeyTerms.txt&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=21914</id>
		<title>Demo Day Page Parser</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Demo_Day_Page_Parser&amp;diff=21914"/>
		<updated>2017-11-15T21:20:43Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Demo Day Page Parser&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
==Project Specs==&lt;br /&gt;
The goal of this project is to leverage data mining with Selenium and Machine Learning to get good candidate web pages for Demo Days for accelerators. Relevant information on the project can be found on the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Data Accelerator Data] page.&lt;br /&gt;
&lt;br /&gt;
==Code Location==&lt;br /&gt;
The code directory for this project can be found:&lt;br /&gt;
 E:\McNair\Software\Accelerators&lt;br /&gt;
&lt;br /&gt;
The Selenium-based crawler can be found in the file below. This script runs a google search on accelerator names and keywords, and saves the urls and html pages for future use:&lt;br /&gt;
 DemoDayCrawler.py&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
A script to rip from HTML to TXT can be found below. This script reads HTML files from a directory, and writes them to TXT in another directory:&lt;br /&gt;
 htmlToText.py&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21913</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21913"/>
		<updated>2017-11-15T21:19:46Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder]. Finished re-formatting work logs.&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21912</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21912"/>
		<updated>2017-11-15T21:19:23Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21911</id>
		<title>Tiger Geocoder</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21911"/>
		<updated>2017-11-15T21:14:12Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Tiger Geocoder&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has start date=Fall 2017&lt;br /&gt;
|Has keywords=Tiger, Geocoder, Database&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
|Has Image=Tiger.jpg&lt;br /&gt;
}}&lt;br /&gt;
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:&lt;br /&gt;
&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation]&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation]&lt;br /&gt;
&lt;br /&gt;
The official documentation for using and installing the Tiger Geocoder can be found in the following.&lt;br /&gt;
&lt;br /&gt;
[https://postgis.net/docs/Extras.html#Tiger_Geocoder General Instructions]&lt;br /&gt;
[https://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension Installation Instructions]&lt;br /&gt;
[http://postgis.net/docs/Geocode.html Geocoder Documentation]&lt;br /&gt;
&lt;br /&gt;
==Location==&lt;br /&gt;
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called &amp;quot;coffeeshops&amp;quot; that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then, &lt;br /&gt;
&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
===Install and Nation Data===&lt;br /&gt;
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:&lt;br /&gt;
 --Add Extensions to database&lt;br /&gt;
 CREATE EXTENSION postgis;&lt;br /&gt;
 CREATE EXTENSION fuzzystrmatch;&lt;br /&gt;
 CREATE EXTENSION postgis_tiger_geocoder;&lt;br /&gt;
 CREATE EXTENSION address_standardizer;&lt;br /&gt;
&lt;br /&gt;
You can test that the installation worked by running the following query: &lt;br /&gt;
 SELECT na.address, na.streetname,na.streettypeabbrev, na.zip&lt;br /&gt;
 	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;&lt;br /&gt;
&lt;br /&gt;
This should return the following:&lt;br /&gt;
  address | streetname | streettypeabbrev |  zip&lt;br /&gt;
 ---------+------------+------------------+-------&lt;br /&gt;
 	   1 | Devonshire | Pl               | 02109&lt;br /&gt;
&lt;br /&gt;
Next, a new profile needs to be created by using the following command.&lt;br /&gt;
 INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, &lt;br /&gt;
 		   loader, environ_set_command, county_process_command)&lt;br /&gt;
 SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,&lt;br /&gt;
 	   loader, environ_set_command, county_process_command&lt;br /&gt;
   FROM tiger.loader_platform&lt;br /&gt;
   WHERE os = 'sh';&lt;br /&gt;
&lt;br /&gt;
The installation instructions also provide the following note:&lt;br /&gt;
&lt;br /&gt;
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.&lt;br /&gt;
&lt;br /&gt;
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.&lt;br /&gt;
&lt;br /&gt;
 UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';&lt;br /&gt;
&lt;br /&gt;
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}                            +&lt;br /&gt;
 &lt;br /&gt;
 TMPDIR=&amp;quot;${staging_fold}/temp/&amp;quot;                +&lt;br /&gt;
 UNZIPTOOL=unzip                               +&lt;br /&gt;
 WGETTOOL=&amp;quot;/usr/bin/wget&amp;quot;                      +&lt;br /&gt;
 export PGBIN=/usr/lib/postgresql/9.6/bin      +&lt;br /&gt;
 export PGPORT=5432                            +&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}&lt;br /&gt;
&lt;br /&gt;
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:&lt;br /&gt;
 /gisdata&lt;br /&gt;
&lt;br /&gt;
There needs to be a directory called &amp;quot;temp&amp;quot; in the gisdata directory. To make the script, use the following from the command line:&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Nation_Script('test')&amp;quot; -d databasename -tA &amp;gt; /gisdata/nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths. &lt;br /&gt;
&lt;br /&gt;
Change directories:&lt;br /&gt;
 cd /gisdata&lt;br /&gt;
&lt;br /&gt;
Edit the script using your favorite command line text editor. Specifically, edit the following fields.&lt;br /&gt;
 PGUSER=postgres&lt;br /&gt;
 PGPASSWORD=(Ask Anne for this password)!&lt;br /&gt;
Everything else remains the same.&lt;br /&gt;
&lt;br /&gt;
 Run the script by using: &lt;br /&gt;
 sh nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.&lt;br /&gt;
&lt;br /&gt;
===State Data===&lt;br /&gt;
&lt;br /&gt;
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Script(ARRAY['MA'], 'test')&amp;quot; -d geocoder -tA &amp;gt; /gisdata/ma_load.sh&lt;br /&gt;
&lt;br /&gt;
CURRENT PROGRESS:&lt;br /&gt;
The following states have been downloaded into the geocoder database.&lt;br /&gt;
 AL, AK, AZ, AR, CA, CO, CT, DE, FL, MA&lt;br /&gt;
&lt;br /&gt;
===Current Errors===&lt;br /&gt;
The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error &lt;br /&gt;
 HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
Possible thoughts:&lt;br /&gt;
&lt;br /&gt;
Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.&lt;br /&gt;
&lt;br /&gt;
Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.&lt;br /&gt;
&lt;br /&gt;
[https://trac.osgeo.org/postgis/ticket/3699 This] is the only online forum I could find with others who have faced a similar issue.&lt;br /&gt;
&lt;br /&gt;
==Geocode Function==&lt;br /&gt;
&lt;br /&gt;
The official arguments for the function are the following:&lt;br /&gt;
 setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);&lt;br /&gt;
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.&lt;br /&gt;
&lt;br /&gt;
===Single Address===&lt;br /&gt;
&lt;br /&gt;
An example query for a single address is:&lt;br /&gt;
 SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,&lt;br /&gt;
     (addy).address As stno, (addy).streetname As street,&lt;br /&gt;
     (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip&lt;br /&gt;
     FROM geocode('75 State Street, Boston MA 02109', 1) As g;&lt;br /&gt;
&lt;br /&gt;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess.&lt;br /&gt;
&lt;br /&gt;
ST_X(g.geomout) -- This retrieves the longitude coordinate of the point.&lt;br /&gt;
&lt;br /&gt;
ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point.&lt;br /&gt;
&lt;br /&gt;
addy -- In general, addy is a normalized address resulting from the input address.&lt;br /&gt;
&lt;br /&gt;
(addy).address -- The number of the address (Ex: &amp;quot;75&amp;quot; Blabla rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streetname -- The name of the street (Ex: 75 &amp;quot;Blabla&amp;quot; rd.)&lt;br /&gt;
&lt;br /&gt;
(addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla &amp;quot;rd&amp;quot;)&lt;br /&gt;
&lt;br /&gt;
(addy).location -- The city location of the address.&lt;br /&gt;
&lt;br /&gt;
(addy).stateabbrev -- The state abbreviation.&lt;br /&gt;
&lt;br /&gt;
(addy).zip -- The zipcode of the address.&lt;br /&gt;
&lt;br /&gt;
The output of the query above would be:&lt;br /&gt;
  rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip&lt;br /&gt;
 --------+-------------------+----------------+------+--------+------+--------+----+-------&lt;br /&gt;
       0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109&lt;br /&gt;
&lt;br /&gt;
==Current Errors==&lt;br /&gt;
Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found [https://www.census.gov/geo/about/contact.html here].&lt;br /&gt;
&lt;br /&gt;
The email exchanges are recorded below.&lt;br /&gt;
&lt;br /&gt;
--------------------------------------&lt;br /&gt;
--------------------------------------&lt;br /&gt;
&lt;br /&gt;
I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data. &lt;br /&gt;
&lt;br /&gt;
When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening? &lt;br /&gt;
&lt;br /&gt;
The extension I'm trying to install is below:&lt;br /&gt;
&lt;br /&gt;
http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension&lt;br /&gt;
&lt;br /&gt;
When I run any of the scripts that require data from TIGER, I am receiving the following error:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--2017-11-06 14:15:29--  http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip&lt;br /&gt;
Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c&lt;br /&gt;
Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2017-11-06 14:15:29 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
Thank you for reaching out to us.  The postgis docs is not from the Geography Division, so I cannot comment on that.  A couple things that come to mind.&lt;br /&gt;
&lt;br /&gt;
We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles.&lt;br /&gt;
You may need to clean out your cookies, restart your browser, and then attempt to reinstall.&lt;br /&gt;
Were you able to download our Shapefiles successfully?&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
The Shapefiles are part of the files being blocked by the 403 Forbidden error. &lt;br /&gt;
&lt;br /&gt;
The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?&lt;br /&gt;
&lt;br /&gt;
Our IP address is 128.42.44.181. &lt;br /&gt;
&lt;br /&gt;
---------------------------------------&lt;br /&gt;
&lt;br /&gt;
I have forwarded your question on to our IT folks.  Since I work on the subject matter side, I am unable to answer your questions.  Once I hear back from them, I will forward their response to you.  Hopefully they will provide to you what you need in order to download our Shapefiles.  My apologies for your inconvenience.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21910</id>
		<title>Tiger Geocoder</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21910"/>
		<updated>2017-11-15T21:13:42Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Tiger Geocoder&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has start date=Fall 2017&lt;br /&gt;
|Has keywords=Tiger, Geocoder, Database&lt;br /&gt;
|Has project status=Active&lt;br /&gt;
|Has Image=Tiger.jpg&lt;br /&gt;
}}&lt;br /&gt;
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:&lt;br /&gt;
&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation]&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation]&lt;br /&gt;
&lt;br /&gt;
The official documentation for using and installing the Tiger Geocoder can be found in the following.&lt;br /&gt;
&lt;br /&gt;
[https://postgis.net/docs/Extras.html#Tiger_Geocoder General Instructions]&lt;br /&gt;
[https://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension Installation Instructions]&lt;br /&gt;
[http://postgis.net/docs/Geocode.html Geocoder Documentation]&lt;br /&gt;
&lt;br /&gt;
==Location==&lt;br /&gt;
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called &amp;quot;coffeeshops&amp;quot; that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then, &lt;br /&gt;
&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
===Install and Nation Data===&lt;br /&gt;
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:&lt;br /&gt;
 --Add Extensions to database&lt;br /&gt;
 CREATE EXTENSION postgis;&lt;br /&gt;
 CREATE EXTENSION fuzzystrmatch;&lt;br /&gt;
 CREATE EXTENSION postgis_tiger_geocoder;&lt;br /&gt;
 CREATE EXTENSION address_standardizer;&lt;br /&gt;
&lt;br /&gt;
You can test that the installation worked by running the following query: &lt;br /&gt;
 SELECT na.address, na.streetname,na.streettypeabbrev, na.zip&lt;br /&gt;
 	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;&lt;br /&gt;
&lt;br /&gt;
This should return the following:&lt;br /&gt;
  address | streetname | streettypeabbrev |  zip&lt;br /&gt;
 ---------+------------+------------------+-------&lt;br /&gt;
 	   1 | Devonshire | Pl               | 02109&lt;br /&gt;
&lt;br /&gt;
Next, a new profile needs to be created by using the following command.&lt;br /&gt;
 INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, &lt;br /&gt;
 		   loader, environ_set_command, county_process_command)&lt;br /&gt;
 SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,&lt;br /&gt;
 	   loader, environ_set_command, county_process_command&lt;br /&gt;
   FROM tiger.loader_platform&lt;br /&gt;
   WHERE os = 'sh';&lt;br /&gt;
&lt;br /&gt;
The installation instructions also provide the following note:&lt;br /&gt;
&lt;br /&gt;
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.&lt;br /&gt;
&lt;br /&gt;
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.&lt;br /&gt;
&lt;br /&gt;
 UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';&lt;br /&gt;
&lt;br /&gt;
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}                            +&lt;br /&gt;
 &lt;br /&gt;
 TMPDIR=&amp;quot;${staging_fold}/temp/&amp;quot;                +&lt;br /&gt;
 UNZIPTOOL=unzip                               +&lt;br /&gt;
 WGETTOOL=&amp;quot;/usr/bin/wget&amp;quot;                      +&lt;br /&gt;
 export PGBIN=/usr/lib/postgresql/9.6/bin      +&lt;br /&gt;
 export PGPORT=5432                            +&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}&lt;br /&gt;
&lt;br /&gt;
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:&lt;br /&gt;
 /gisdata&lt;br /&gt;
&lt;br /&gt;
There needs to be a directory called &amp;quot;temp&amp;quot; in the gisdata directory. To make the script, use the following from the command line:&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Nation_Script('test')&amp;quot; -d databasename -tA &amp;gt; /gisdata/nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths. &lt;br /&gt;
&lt;br /&gt;
Change directories:&lt;br /&gt;
 cd /gisdata&lt;br /&gt;
&lt;br /&gt;
Edit the script using your favorite command line text editor. Specifically, edit the following fields.&lt;br /&gt;
 PGUSER=postgres&lt;br /&gt;
 PGPASSWORD=(Ask Anne for this password)!&lt;br /&gt;
Everything else remains the same.&lt;br /&gt;
&lt;br /&gt;
 Run the script by using: &lt;br /&gt;
 sh nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.&lt;br /&gt;
&lt;br /&gt;
===State Data===&lt;br /&gt;
&lt;br /&gt;
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Script(ARRAY['MA'], 'test')&amp;quot; -d geocoder -tA &amp;gt; /gisdata/ma_load.sh&lt;br /&gt;
&lt;br /&gt;
CURRENT PROGRESS:&lt;br /&gt;
The following states have been downloaded into the geocoder database.&lt;br /&gt;
 AL, AK, AZ, AR, CA, CO, CT, DE, FL, MA&lt;br /&gt;
&lt;br /&gt;
===Current Errors===&lt;br /&gt;
The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error &lt;br /&gt;
 HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
Possible thoughts:&lt;br /&gt;
&lt;br /&gt;
Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.&lt;br /&gt;
&lt;br /&gt;
Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.&lt;br /&gt;
&lt;br /&gt;
[https://trac.osgeo.org/postgis/ticket/3699 This] is the only online forum I could find with others who have faced a similar issue.&lt;br /&gt;
&lt;br /&gt;
==Geocode Function==&lt;br /&gt;
&lt;br /&gt;
The official arguments for the function are the following:&lt;br /&gt;
 setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);&lt;br /&gt;
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.&lt;br /&gt;
&lt;br /&gt;
===Single Address===&lt;br /&gt;
&lt;br /&gt;
An example query for a single address is:&lt;br /&gt;
 SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,&lt;br /&gt;
     (addy).address As stno, (addy).streetname As street,&lt;br /&gt;
     (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip&lt;br /&gt;
     FROM geocode('75 State Street, Boston MA 02109', 1) As g;&lt;br /&gt;
&lt;br /&gt;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess.&lt;br /&gt;
ST_X(g.geomout) -- This retrieves the longitude coordinate of the point.&lt;br /&gt;
ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point.&lt;br /&gt;
addy -- In general, addy is a normalized address resulting from the input address.&lt;br /&gt;
(addy).address -- The number of the address (Ex: &amp;quot;75&amp;quot; Blabla rd.)&lt;br /&gt;
(addy).streetname -- The name of the street (Ex: 75 &amp;quot;Blabla&amp;quot; rd.)&lt;br /&gt;
(addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla &amp;quot;rd&amp;quot;)&lt;br /&gt;
(addy).location -- The city location of the address.&lt;br /&gt;
(addy).stateabbrev -- The state abbreviation.&lt;br /&gt;
(addy).zip -- The zipcode of the address.&lt;br /&gt;
&lt;br /&gt;
The output of the query above would be:&lt;br /&gt;
  rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip&lt;br /&gt;
 --------+-------------------+----------------+------+--------+------+--------+----+-------&lt;br /&gt;
       0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109&lt;br /&gt;
&lt;br /&gt;
==Current Errors==&lt;br /&gt;
Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found [https://www.census.gov/geo/about/contact.html here].&lt;br /&gt;
&lt;br /&gt;
The email exchanges are recorded below.&lt;br /&gt;
&lt;br /&gt;
--------------------------------------&lt;br /&gt;
--------------------------------------&lt;br /&gt;
&lt;br /&gt;
I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data. &lt;br /&gt;
&lt;br /&gt;
When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening? &lt;br /&gt;
&lt;br /&gt;
The extension I'm trying to install is below:&lt;br /&gt;
&lt;br /&gt;
http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension&lt;br /&gt;
&lt;br /&gt;
When I run any of the scripts that require data from TIGER, I am receiving the following error:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--2017-11-06 14:15:29--  http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip&lt;br /&gt;
Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c&lt;br /&gt;
Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2017-11-06 14:15:29 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
Thank you for reaching out to us.  The postgis docs is not from the Geography Division, so I cannot comment on that.  A couple things that come to mind.&lt;br /&gt;
&lt;br /&gt;
We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles.&lt;br /&gt;
You may need to clean out your cookies, restart your browser, and then attempt to reinstall.&lt;br /&gt;
Were you able to download our Shapefiles successfully?&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
The Shapefiles are part of the files being blocked by the 403 Forbidden error. &lt;br /&gt;
&lt;br /&gt;
The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?&lt;br /&gt;
&lt;br /&gt;
Our IP address is 128.42.44.181. &lt;br /&gt;
&lt;br /&gt;
---------------------------------------&lt;br /&gt;
&lt;br /&gt;
I have forwarded your question on to our IT folks.  Since I work on the subject matter side, I am unable to answer your questions.  Once I hear back from them, I will forward their response to you.  Hopefully they will provide to you what you need in order to download our Shapefiles.  My apologies for your inconvenience.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21908</id>
		<title>Tiger Geocoder</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Tiger_Geocoder&amp;diff=21908"/>
		<updated>2017-11-15T21:05:42Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;{{McNair Projects&lt;br /&gt;
|Has title=Tiger Geocoder&lt;br /&gt;
|Has Image=Tiger.jpg&lt;br /&gt;
|Has owner=Peter Jalbert,&lt;br /&gt;
|Has start date=Fall 2017&lt;br /&gt;
|Has keywords=Tiger, Geocoder, Database&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
This page serves as documentation for using the Tiger Geocoder on Postgres SQL, as part of the PostGIS extension. The following wiki pages may also be of use to you:&lt;br /&gt;
&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation]&lt;br /&gt;
[http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation]&lt;br /&gt;
&lt;br /&gt;
The official documentation for using and installing the Tiger Geocoder can be found in the following.&lt;br /&gt;
&lt;br /&gt;
[https://postgis.net/docs/Extras.html#Tiger_Geocoder General Instructions]&lt;br /&gt;
[https://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension Installation Instructions]&lt;br /&gt;
[http://postgis.net/docs/Geocode.html Geocoder Documentation]&lt;br /&gt;
&lt;br /&gt;
==Location==&lt;br /&gt;
The data is currently loaded into a psql database called geocoder. The tables contain the geocoding information, and there is a test table called &amp;quot;coffeeshops&amp;quot; that contains addresses of Houston coffeeshops according to yelp. To access the database, first login to the McNair DB Sever. Then, &lt;br /&gt;
&lt;br /&gt;
 psql geocoder&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Installation==&lt;br /&gt;
&lt;br /&gt;
===Install and Nation Data===&lt;br /&gt;
I began by adding the extension listed above. First, enter into Postgres by using the psql command. Then:&lt;br /&gt;
 --Add Extensions to database&lt;br /&gt;
 CREATE EXTENSION postgis;&lt;br /&gt;
 CREATE EXTENSION fuzzystrmatch;&lt;br /&gt;
 CREATE EXTENSION postgis_tiger_geocoder;&lt;br /&gt;
 CREATE EXTENSION address_standardizer;&lt;br /&gt;
&lt;br /&gt;
You can test that the installation worked by running the following query: &lt;br /&gt;
 SELECT na.address, na.streetname,na.streettypeabbrev, na.zip&lt;br /&gt;
 	FROM normalize_address('1 Devonshire Place, Boston, MA 02109') AS na;&lt;br /&gt;
&lt;br /&gt;
This should return the following:&lt;br /&gt;
  address | streetname | streettypeabbrev |  zip&lt;br /&gt;
 ---------+------------+------------------+-------&lt;br /&gt;
 	   1 | Devonshire | Pl               | 02109&lt;br /&gt;
&lt;br /&gt;
Next, a new profile needs to be created by using the following command.&lt;br /&gt;
 INSERT INTO tiger.loader_platform(os, declare_sect, pgbin, wget, unzip_command, psql, path_sep, &lt;br /&gt;
 		   loader, environ_set_command, county_process_command)&lt;br /&gt;
 SELECT 'test', declare_sect, pgbin, wget, unzip_command, psql, path_sep,&lt;br /&gt;
 	   loader, environ_set_command, county_process_command&lt;br /&gt;
   FROM tiger.loader_platform&lt;br /&gt;
   WHERE os = 'sh';&lt;br /&gt;
&lt;br /&gt;
The installation instructions also provide the following note:&lt;br /&gt;
&lt;br /&gt;
As of PostGIS 2.4.1 the Zip code-5 digit tabulation area zcta5 load step was revised to load current zcta5 data and is part of the Loader_Generate_Nation_Script when enabled. It is turned off by default because it takes quite a bit of time to load (20 to 60 minutes), takes up quite a bit of disk space, and is not used that often.&lt;br /&gt;
&lt;br /&gt;
If you would like this feature, you can enable it by using the following command. This should be done before loading the script.&lt;br /&gt;
&lt;br /&gt;
 UPDATE tiger.loader_lookuptables SET load = true WHERE table_name = 'zcta510';&lt;br /&gt;
&lt;br /&gt;
The paths in declare_sect need to be edited so they match our server locations. One option is to edit the declare_sect column in the tiger.loader_platform table. If so, the declare_sect looks like the following:&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}                            +&lt;br /&gt;
 &lt;br /&gt;
 TMPDIR=&amp;quot;${staging_fold}/temp/&amp;quot;                +&lt;br /&gt;
 UNZIPTOOL=unzip                               +&lt;br /&gt;
 WGETTOOL=&amp;quot;/usr/bin/wget&amp;quot;                      +&lt;br /&gt;
 export PGBIN=/usr/lib/postgresql/9.6/bin      +&lt;br /&gt;
 export PGPORT=5432                            +&lt;br /&gt;
 export PGHOST=localhost                       +&lt;br /&gt;
 export PGUSER=postgres                        +&lt;br /&gt;
 export PGPASSWORD=yourpasswordhere            +&lt;br /&gt;
 export PGDATABASE=geocoder                    +&lt;br /&gt;
 PSQL=${PGBIN}/psql                            +&lt;br /&gt;
 SHP2PGSQL=shp2pgsql                           +&lt;br /&gt;
 cd ${staging_fold}&lt;br /&gt;
&lt;br /&gt;
Another option is to edit the sh file before running the script. We will do this option until further notice. Simply use your favorite command line editor to change the fields to their correct values. The downloaded script is located in the following directory:&lt;br /&gt;
 /gisdata&lt;br /&gt;
&lt;br /&gt;
There needs to be a directory called &amp;quot;temp&amp;quot; in the gisdata directory. To make the script, use the following from the command line:&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Nation_Script('test')&amp;quot; -d databasename -tA &amp;gt; /gisdata/nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
This will create a script in the /gisdata directory. Change to that directory. If you did not edit the paths in the declare_sect table in psql, then you will need to edit this file to contain the correct paths. &lt;br /&gt;
&lt;br /&gt;
Change directories:&lt;br /&gt;
 cd /gisdata&lt;br /&gt;
&lt;br /&gt;
Edit the script using your favorite command line text editor. Specifically, edit the following fields.&lt;br /&gt;
 PGUSER=postgres&lt;br /&gt;
 PGPASSWORD=(Ask Anne for this password)!&lt;br /&gt;
Everything else remains the same.&lt;br /&gt;
&lt;br /&gt;
 Run the script by using: &lt;br /&gt;
 sh nation_script_load.sh&lt;br /&gt;
&lt;br /&gt;
Now, there is a barebones table in the database that will hold the information for the nation. Next is to download data on each state.&lt;br /&gt;
&lt;br /&gt;
===State Data===&lt;br /&gt;
&lt;br /&gt;
The state scripts are generated in much the same way that the nation script was generated. Use the following command, substituting MA for your desired state abbreviation, and substituting a unique filename at the end.&lt;br /&gt;
 psql -c &amp;quot;SELECT Loader_Generate_Script(ARRAY['MA'], 'test')&amp;quot; -d geocoder -tA &amp;gt; /gisdata/ma_load.sh&lt;br /&gt;
&lt;br /&gt;
CURRENT PROGRESS:&lt;br /&gt;
The following states have been downloaded into the geocoder database.&lt;br /&gt;
 AL, AK, AZ, AR, CA, CO, CT, DE, FL, MA&lt;br /&gt;
&lt;br /&gt;
===Current Errors===&lt;br /&gt;
The state scripts stopped working on 11/1/2017 while they were working on 10/31/2017. Now, when a retrieval script is run, it draws the error &lt;br /&gt;
 HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
Possible thoughts:&lt;br /&gt;
&lt;br /&gt;
Maybe our IP has been blacklisted for downloading data from a government website quickly? If so, PostGIS should really choose a different installation method. The current one is dumb.&lt;br /&gt;
&lt;br /&gt;
Maybe the nation downloader script never worked properly. Not sure how to check if it is correct or not; seems right.&lt;br /&gt;
&lt;br /&gt;
[https://trac.osgeo.org/postgis/ticket/3699 This] is the only online forum I could find with others who have faced a similar issue.&lt;br /&gt;
&lt;br /&gt;
==Geocode Function==&lt;br /&gt;
&lt;br /&gt;
The official arguments for the function are the following:&lt;br /&gt;
 setof record geocode(varchar address, integer max_results=10, geometry restrict_region=NULL, norm_addy OUT addy, geometry OUT geomout, integer OUT rating);&lt;br /&gt;
The arguments of interest are address, where you simply submit a string, and max_results, which restricts the number of geocoding attempts per address. The geocoder makes multiple guesses to the location of an address, and returns the best guesses in order. If you want multiple guesses to a specific address, then specify max_results to be more than 1.&lt;br /&gt;
&lt;br /&gt;
===Single Address===&lt;br /&gt;
&lt;br /&gt;
An example query for a single address is:&lt;br /&gt;
 SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,&lt;br /&gt;
     (addy).address As stno, (addy).streetname As street,&lt;br /&gt;
     (addy).streettypeabbrev As styp, (addy).location As city, (addy).stateabbrev As st,(addy).zip&lt;br /&gt;
     FROM geocode('75 State Street, Boston MA 02109', 1) As g;&lt;br /&gt;
&lt;br /&gt;
rating -- This is an integer that determines the confidence in the resulting geocode. The closer to 0, the more confident the guess.&lt;br /&gt;
ST_X(g.geomout) -- This retrieves the longitude coordinate of the point.&lt;br /&gt;
ST_Y(g.geomout) -- This retrieves the latitude corodinate of the point.&lt;br /&gt;
addy -- In general, addy is a normalized address resulting from the input address.&lt;br /&gt;
(addy).address -- The number of the address (Ex: &amp;quot;75&amp;quot; Blabla rd.)&lt;br /&gt;
(addy).streetname -- The name of the street (Ex: 75 &amp;quot;Blabla&amp;quot; rd.)&lt;br /&gt;
(addy).streettypeabbrev -- The abbreviation of the street (Ex: 75 blabla &amp;quot;rd&amp;quot;)&lt;br /&gt;
(addy).location -- The city location of the address.&lt;br /&gt;
(addy).stateabbrev -- The state abbreviation.&lt;br /&gt;
(addy).zip -- The zipcode of the address.&lt;br /&gt;
&lt;br /&gt;
The output of the query above would be:&lt;br /&gt;
  rating |        lon        |      lat       | stno | street | styp |  city  | st |  zip&lt;br /&gt;
 --------+-------------------+----------------+------+--------+------+--------+----+-------&lt;br /&gt;
       0 | -71.0557505845646 | 42.35897920691 |   75 | State  | St   | Boston | MA | 02109&lt;br /&gt;
&lt;br /&gt;
==Current Errors==&lt;br /&gt;
Currently, we are getting a 403 Forbidden Error when trying to download state data. We are in contact with the US Census Bureau. Their contact information can be found [https://www.census.gov/geo/about/contact.html here].&lt;br /&gt;
&lt;br /&gt;
The email exchanges are recorded below.&lt;br /&gt;
&lt;br /&gt;
--------------------------------------&lt;br /&gt;
--------------------------------------&lt;br /&gt;
&lt;br /&gt;
I am a student researcher at the McNair Center for Entrepreneurship and Innovation at Rice University, and I am in the process of installing a Postgres Extension that relies on the TIGER data. &lt;br /&gt;
&lt;br /&gt;
When I began installing the extension, things were working fine. Now however, I am getting a 403 Forbidden error when the script tries to download the TIGER files. Do you have any idea why this might be happening? &lt;br /&gt;
&lt;br /&gt;
The extension I'm trying to install is below:&lt;br /&gt;
&lt;br /&gt;
http://postgis.net/docs/postgis_installation.html#install_tiger_geocoder_extension&lt;br /&gt;
&lt;br /&gt;
When I run any of the scripts that require data from TIGER, I am receiving the following error:&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--2017-11-06 14:15:29--  http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_06065_featnames.zip&lt;br /&gt;
Resolving www2.census.gov (www2.census.gov)... 104.84.241.90, 2600:1404:a:382::208c, 2600:1404:a:39c::208c&lt;br /&gt;
Connecting to www2.census.gov (www2.census.gov)|104.84.241.90|:80... connected.&lt;br /&gt;
HTTP request sent, awaiting response... 403 Forbidden&lt;br /&gt;
2017-11-06 14:15:29 ERROR 403: Forbidden.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
Thank you for reaching out to us.  The postgis docs is not from the Geography Division, so I cannot comment on that.  A couple things that come to mind.&lt;br /&gt;
&lt;br /&gt;
We just released our new 2017 Shapefiles, so it's possible the scripts may be written for a previous version of our Shapefiles.&lt;br /&gt;
You may need to clean out your cookies, restart your browser, and then attempt to reinstall.&lt;br /&gt;
Were you able to download our Shapefiles successfully?&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;br /&gt;
&lt;br /&gt;
The Shapefiles are part of the files being blocked by the 403 Forbidden error. &lt;br /&gt;
&lt;br /&gt;
The script is using a wget protocol to bulk download the data, so there were no cookies involved. Also, the download had been working in previous days; it was only recently that the same scripts stopped working. I am worried that our IP address somehow ended up on a blacklist for the TIGER data. Is there a blacklist for addresses that access the TIGER data?&lt;br /&gt;
&lt;br /&gt;
Our IP address is 128.42.44.181. &lt;br /&gt;
&lt;br /&gt;
---------------------------------------&lt;br /&gt;
&lt;br /&gt;
I have forwarded your question on to our IT folks.  Since I work on the subject matter side, I am unable to answer your questions.  Once I hear back from them, I will forward their response to you.  Hopefully they will provide to you what you need in order to download our Shapefiles.  My apologies for your inconvenience.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
----------------------------------------&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
	<entry>
		<id>http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21907</id>
		<title>Peter Jalbert (Work Log)</title>
		<link rel="alternate" type="text/html" href="http://www.edegan.com/mediawiki/index.php?title=Peter_Jalbert_(Work_Log)&amp;diff=21907"/>
		<updated>2017-11-15T20:57:54Z</updated>

		<summary type="html">&lt;p&gt;Peterjalbert: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;===Fall 2017===&lt;br /&gt;
&amp;lt;onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
[[Peter Jalbert]] [[Work Logs]] [[Peter Jalbert (Work Log)|(log page)]] &lt;br /&gt;
&lt;br /&gt;
2017-11-15: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-14: Continued running [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser]. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder TIGER Geocoder].&lt;br /&gt;
&lt;br /&gt;
2017-11-13: Built [http://mcnair.bakerinstitute.org/wiki/Demo_Day_Page_Parser Demo Day Page Parser].&lt;br /&gt;
&lt;br /&gt;
2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format. &lt;br /&gt;
&lt;br /&gt;
2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List] page. Still waiting for feedback on the PostGIS installation from [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder]. Continued working on Accelerator Google Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-11-06: Contacted Geography Center for the US Census Bureau, [https://www.census.gov/geo/about/contact.html here], and began email exchange on PostGIS installation problems. Began working on the [http://mcnair.bakerinstitute.org/wiki/Selenium_Documentation Selenium Documentation]. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.&lt;br /&gt;
&lt;br /&gt;
2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder Page].&lt;br /&gt;
&lt;br /&gt;
2017-10-31: Began downloading blocks of data for individual states for the [http://mcnair.bakerinstitute.org/wiki/Tiger_Geocoder Tiger Geocoder] project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.&lt;br /&gt;
&lt;br /&gt;
2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation#Editing_Users the database server documentation] under &amp;quot;Editing Users&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
2017-10-25: Continued working on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation TigerCoder Installation].&lt;br /&gt;
&lt;br /&gt;
2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the [http://mcnair.bakerinstitute.org/wiki/PostGIS_Installation PostGIS Installation page].&lt;br /&gt;
&lt;br /&gt;
2017-10-23: Finished Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-19: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_District Houston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-18: Continued work on Yelp crawler for [http://mcnair.bakerinstitute.org/wiki/Houston_Innovation_DistrictHouston Innovation District Project].&lt;br /&gt;
&lt;br /&gt;
2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.&lt;br /&gt;
&lt;br /&gt;
2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.&lt;br /&gt;
&lt;br /&gt;
2017-10-13: Updated various project wiki pages.&lt;br /&gt;
&lt;br /&gt;
2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.&lt;br /&gt;
&lt;br /&gt;
2017-10-05: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-04: Emergency ArcGIS creation for Agglomeration project.&lt;br /&gt;
&lt;br /&gt;
2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.&lt;br /&gt;
&lt;br /&gt;
2017-09-28: Added collaborative editing feature to PyCharm. &lt;br /&gt;
&lt;br /&gt;
2017-09-27: Worked on big database file.&lt;br /&gt;
&lt;br /&gt;
2017-09-25: New task -- Create text file with company, description, and company type.&lt;br /&gt;
#[http://mcnair.bakerinstitute.org/wiki/VC_Database_Rebuild VC Database Rebuild]&lt;br /&gt;
#psql vcdb2&lt;br /&gt;
#table name, sdccompanybasecore2&lt;br /&gt;
#Combine with Crunchbasebulk&lt;br /&gt;
&lt;br /&gt;
#TODO: Write wiki on linkedin crawler, write wiki on creating accounts.&lt;br /&gt;
&lt;br /&gt;
2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project. &lt;br /&gt;
&lt;br /&gt;
2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.&lt;br /&gt;
&lt;br /&gt;
2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.&lt;br /&gt;
&lt;br /&gt;
2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.&lt;br /&gt;
&lt;br /&gt;
2017-09-14: Continued implementing LinkedIn Crawler for profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.&lt;br /&gt;
&lt;br /&gt;
2017-09-12: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. Added to the wiki on this topic.&lt;br /&gt;
&lt;br /&gt;
2017-09-11: Continued working on the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler for Accelerator Founders Data]. &lt;br /&gt;
&lt;br /&gt;
2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see [http://mcnair.bakerinstitute.org/wiki/Crunchbase_Data here].&lt;br /&gt;
&lt;br /&gt;
2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.&lt;br /&gt;
&lt;br /&gt;
2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.&lt;br /&gt;
&lt;br /&gt;
2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) here] under Section 4.&lt;br /&gt;
&lt;br /&gt;
2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.&lt;br /&gt;
&amp;lt;/onlyinclude&amp;gt; &lt;br /&gt;
&lt;br /&gt;
===Spring 2017===&lt;br /&gt;
&lt;br /&gt;
2017-05-01: Continued work on HTML Parser. Uploaded all semester projects to git server.&lt;br /&gt;
&lt;br /&gt;
2017-04-20: Finished the HTML Parser for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Ran HTML parser on accelerator founders. Data is stored in projects/accelerators/LinkedIn Founder Data.&lt;br /&gt;
&lt;br /&gt;
2017-04-19: Made updates to the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler] Wikipage. Ran LinkedIn Crawler on accelerator data. Working on an html parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-18: Ran LinkedIn Crawler on matches between Crunchbase Snapshot and the accelerator data.&lt;br /&gt;
&lt;br /&gt;
2017-04-17: Worked on ways to get correct search results from the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Worked on an HTML Parser for the results from the LinkedIn Crawler.&lt;br /&gt;
&lt;br /&gt;
2017-04-13: Worked on debugging the logout procedure for the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Began formulation of process to search for founders of startups using a combination of the LinkedIn Crawler with the data resources from the [http://mcnair.bakerinstitute.org/wiki/Crunchbase_2013_Snapshot CrunchBase Snapshot].&lt;br /&gt;
&lt;br /&gt;
2017-04-12: Work on bugs with the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-04-11: Completed functional [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) crawler of LinkedIn Recruiter Pro]. Basic search functions work and download profile information for a given person. &lt;br /&gt;
&lt;br /&gt;
2017-04-10: Began writing functioning crawler of LinkedIn. &lt;br /&gt;
&lt;br /&gt;
2017-04-06: Continued working on debugging and documenting the [http://mcnair.bakerinstitute.org/wiki/LinkedIn_Crawler_(Python) LinkedIn Crawler]. Wrote a test program that logs in, searches for a query, navigates through search pages, and logs out. Recruiter program can now login and search.&lt;br /&gt;
&lt;br /&gt;
2017-04-05: Began work on the LinkedIn Crawler. Researched on launching Python Virtual Environment.&lt;br /&gt;
&lt;br /&gt;
2017-04-03: Finished debugging points for the Enclosing Circle Algorithm. Added Command Line functionality to the Industry Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-03-29: Worked on debugging points for the Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-28: Finished running the Enclosing Circle Algorithm. Worked on removing incorrect points from the data set(see above). &lt;br /&gt;
&lt;br /&gt;
2017-03-27: Worked on debugging the Enclosing Circle Algorithm. Implemented a way to remove interior circles, and determined that translation to latitude and longitude coordinates resulted in slightly off center circles.&lt;br /&gt;
&lt;br /&gt;
2017-03-23: Finished debugging the brute force algorithm for [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm]. Implemented a method to plot the points and circles on a graph. Analyzed runtime of the brute force algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-03-21: Coded a brute force algorithm for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-20: Worked on  debugging the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-03-09: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Finished script to draw Enclosing Circles on a Google Map.&lt;br /&gt;
&lt;br /&gt;
2017-03-08: Continued running [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities. Created script to draw outcome of the Enclosing Circle Algorithm on Google Maps.&lt;br /&gt;
&lt;br /&gt;
2017-03-07: Redetermined the top 50 cities which Enclosing Circle should be run on. Data on the [http://mcnair.bakerinstitute.org/wiki/Top_Cities_for_VC_Backed_Companies Top 50 Cities for VC Backed Companies can be found here.] Ran [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] on the Top 50 Cities.&lt;br /&gt;
&lt;br /&gt;
2017-03-06: Ran script to determine the top 50 cities which Enclosing Circle should be run on. Fixed the VC Circles script to take in a new data format.&lt;br /&gt;
&lt;br /&gt;
2017-03-02: Cleaned up data for the VC Circles Project. Created histogram of data in Excel. See [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Began work on the [http://mcnair.bakerinstitute.org/wiki/LinkedInCrawlerPython LinkedIn Crawler].&lt;br /&gt;
&lt;br /&gt;
2017-03-01: Created statistics for the VC Circles Project.&lt;br /&gt;
&lt;br /&gt;
2017-02-28: Finished downloading geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project.  Found bug in Enclosing Circle Algorithm.&lt;br /&gt;
&lt;br /&gt;
2017-02-27: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-23: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Installed C++ Compiler for Python. Ran tests on difference between Python and C wrapped Python.&lt;br /&gt;
&lt;br /&gt;
2017-02-22: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Helped out with [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-21: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Researched into C++ Compilers for Python so that the Enclosing Circle Algorithm could be wrapped in C. Found a recommended one [https://www.microsoft.com/en-us/download/details.aspx?id=44266 here].&lt;br /&gt;
&lt;br /&gt;
2017-02-20: Continued to download geocoded data for VC Data as part of the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] Project. Assisted work on the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier].&lt;br /&gt;
&lt;br /&gt;
2017-02-16: Reworked [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] to create a file of geocoded data. Began work on wrapping the algorithm in C to improve speed.&lt;br /&gt;
&lt;br /&gt;
2017-02-15: Finished [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm] applied to the VC study. Enclosing Circle algorithm still needs adjustment, but the program runs with the temporary fixes.&lt;br /&gt;
&lt;br /&gt;
2017-02-14: Worked on the application of the Enclosing Circle algorithm to the VC study. Working on bug fixes in the Enclosing Circle algorithm. Created wiki page for the [http://mcnair.bakerinstitute.org/wiki/Enclosing_Circle_Algorithm Enclosing Circle Algorithm].&lt;br /&gt;
&lt;br /&gt;
2017-02-13: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-08: Worked on Neural Net for the [http://mcnair.bakerinstitute.org/wiki/Industry_Classifier Industry Classifier Project].&lt;br /&gt;
&lt;br /&gt;
2017-02-07: Fixed bugs in parse_cohort_data.py, the script for parsing the cohort data from the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Added descriptive statistics to cohort data excel file.&lt;br /&gt;
&lt;br /&gt;
2017-02-02: Out sick, independent research and work from RDP. Brief research into the [http://jorgeg.scripts.mit.edu/homepage/wp-content/uploads/2016/03/Guzman-Stern-State-of-American-Entrepreneurship-FINAL.pdf Stern-Guzman algorithm]. Research into [http://mcnair.bakerinstitute.org/wiki/interactive_maps Interactive Maps]. No helpful additions to map embedding problem.&lt;br /&gt;
&lt;br /&gt;
2017-02-01: Notes from Session with Ed: Project on US university patenting and entrepreneurship programs (writing code to identify universities in assignees), search Wikipedia (XML then bulk download), student pop, faculty pop, etc.&lt;br /&gt;
Circle project for VC data will end up being a joint project to join accelerator data. &lt;br /&gt;
Pull descriptions for VC. Founders of accelerators in linkedin. LinkedIn cannot be caught(pretend to not be a bot). Can eventually get academic backgrounds through linkedin. &lt;br /&gt;
Pull business registration data, Stern/Guzman Algorithm.  &lt;br /&gt;
GIS ontop of geocoded data.&lt;br /&gt;
Maps that works on wiki or blog (CartoDB), Maps API and R.&lt;br /&gt;
NLP Projects, Description Classifier.&lt;br /&gt;
&lt;br /&gt;
2017-01-31: Built WayBack Machine Crawler. Updated documentation for coordinates script. Updated profile page to include locations of code.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-30: Optimized enclosing circle algorithm through memoization. Developed script to read addresses from accelerator data and return latitude and longitude coordinates.&lt;br /&gt;
&lt;br /&gt;
2017-01-26: Continued working on Google sitesearch project. Discovered crunchbase, changed project priority. Priority 1, split accelerator data up by flag, priority 2, use crunchbase to get web urls for cohorts, priority 3,  make internet archive wayback machine driver. Located [http://mcnair.bakerinstitute.org/wiki/Whois_Parser Whois Parser].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2017-01-25: Finished parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Some data files still need proofreading as they are not in an acceptable format. Began working on Google sitesearch project.&lt;br /&gt;
&lt;br /&gt;
2017-01-24: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Cohort data file created, debugging is almost complete. Will begin work on the google accelerator search soon.&lt;br /&gt;
&lt;br /&gt;
2017-01-23: Worked on parser for cohort data of the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Preliminary code is written, working on debugging.&lt;br /&gt;
&lt;br /&gt;
2017-01-19: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project]. Created parser for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project], completed creation of final data set(yay!). Began working on cohort parser.&lt;br /&gt;
&lt;br /&gt;
2017-01-18: Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-13: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-12: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-11: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
2017-01-10: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Downloaded pdfs in the background for the [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Government Crawler Project].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Fall 2016===&lt;br /&gt;
&lt;br /&gt;
2016-09-26: Set up Staff wiki page, work log page; registered for Slack, Microsoft Remote Desktop; downloaded Selenium on personal computer, read Selenium docs. Created wiki page for Moroccan Web Driver Project.&lt;br /&gt;
&lt;br /&gt;
2016-09-29: Re-enroll in Microsoft Remote Desktop with proper authentication, set up Selenium environment and Komodo IDE on Remote Desktop, wrote program using Selenium that goes to a link and opens up the print dialog box. Developed computational recipe for a different approach to the problem.&lt;br /&gt;
&lt;br /&gt;
2016-09-30: Selenium program selects view pdf option from the website, and goes to the pdf webpage. Program then switches handle to the new page. CTRL S is sent to the page to launch save dialog window. Text cannot be sent to this window. Brainstorm ways around this issue. Explored Chrome Options for saving automatically without a dialog window. Looking into other libraries besides selenium that may help.&lt;br /&gt;
&lt;br /&gt;
2016-10-03: Moroccan Web Driver projects completed for driving of the Monarchy proposed bills, the House of Representatives proposed bills, and the Ratified bills sites. Begun process of devising a naming system for the files that does not require scraping. Tinkered with naming through regular expression parsing of the URL. Structure for the oral questions and written questions drivers is set up, but need fixes due to the differences in the sites. Fixed bug on McNair wiki for women's biz team where email was plain text instead of an email link. Took a glimpse at Kuwait Parliament website, and it appears to be very different from the Moroccan setup.&lt;br /&gt;
&lt;br /&gt;
2016-10-06: Discussed with Dr. Elbadawy about the desired file names for Moroccan data download. The consensus was that the bill programs are ready to launch once the files can be named properly, and the questions data must be retrieved using a web crawler which I need to learn how to implement. The naming of files is currently drawing errors in going from arabic, to url, to download, to filename. Debugging in process. Also built a demo selenium program for Dr. Egan that drives the McNair blog site on an infinite loop.&lt;br /&gt;
&lt;br /&gt;
2016-10-07: Learned unicode and utf8 encoding and decoding in arabic. Still working on transforming an ascii url into printable unicode.&lt;br /&gt;
&lt;br /&gt;
2016-10-11: Fixed arabic bug, files can now be saved with arabic titles. Monarchy bills downloaded and ready for shipment. House of Representatives Bill mostly downloaded, ratified bills prepared for download. Started learning scrapy library in python for web scraping. Discussed idea of screenshot-ing questions instead of scraping. &lt;br /&gt;
&lt;br /&gt;
2016-10-13: Completed download of Moroccan Bills. Working on either a web driver screenshot approach or a webcrawler approach to download the  Moroccan oral and written questions data. Began building Web Crawler for Oral and Written Questions site. Edited Moroccan Web Driver/Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-14: Finished Oral Questions crawler. Finished Written Questions crawler. Waiting for further details on whether that data needs to be tweaked in any way. Updated the Moroccan Web Driver/Web Crawler wiki page. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-18: Finished code for Oral Questions web driver and Written Questions web driver using selenium. Now, the data for the dates of questions can be found using the crawler, and the pdfs of the questions will be downloaded using selenium. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-20: Continued to download data for the Moroccan Parliament Written and Oral Questions. Updated Wiki page. Started working on Twitter project with Christy. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-10-21: Continued to download data for the Moroccan Parliament Written and Oral Questions. Looked over [http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1) Christy's Twitter Crawler] to see how I can be helpful. Dr. Egan asked me to think about how to potentially make multiple tools to get cohorts and other sorts of data from accelerator sites. See [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator List] He also asked me to look at the [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] for potential ideas on how to bring this project to fruition.&lt;br /&gt;
&lt;br /&gt;
2016-11-01: Continued to download Moroccan data in the background. Went over code for GovTracker Web Crawler, continued learning Perl. [http://mcnair.bakerinstitute.org/wiki/Govtrack_Webcrawler_(Wiki_Page) GovTrack Web Crawler] Began Kuwait Web Crawler/Driver.&lt;br /&gt;
&lt;br /&gt;
2016-11-03: Continued to download Moroccan data in the background. Dr. Egan fixed systems requirements to run the GovTrack Web Crawler. Made significant progress on the Kuwait Web Crawler/Driver for the Middle East Studies Department. &lt;br /&gt;
&lt;br /&gt;
2016-11-04: Continued to download Moroccan data in the background. Finished writing initial Kuwait Web Crawler/Driver for the Middle East Studies Department. Middle East Studies Department asked for additional embedded files in the Kuwait website. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-08: Continued to download Moroccan data in the background. Finished writing code for the embedded files on the Kuwait Site. Spent time debugging the frame errors due to the dynamically generated content. Never found an answer to the bug, and instead found a workaround that sacrificed run time for the ability to work. [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Moroccan Web Driver]&lt;br /&gt;
&lt;br /&gt;
2016-11-10: Continued to download Moroccan data and Kuwait data in the background. Began work on  [http://mcnair.bakerinstitute.org/wiki/Google_Scholar_Crawler Google Scholar Crawler]. Wrote a crawler for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] to get the HTML files of hundreds of accelerators. The crawler ended up failing; it appears to have been due to HTTPS. &lt;br /&gt;
&lt;br /&gt;
2016-11-11: Continued to download Moroccan data in the background. Attempted to find bug fixes for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Project] crawler.&lt;br /&gt;
&lt;br /&gt;
2016-11-15: Finished download of Moroccan Written Question pdfs. Wrote a parser with Christy to be used for parsing bills from Congress and eventually executive orders. Found bug in the system Python that was worked out and rebooted.&lt;br /&gt;
&lt;br /&gt;
2016-11-17: Wrote a crawler to retrieve information about executive orders, and their corresponding pdfs. They can be found [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report here.] Next step is to run code to convert the pdfs to text files, then use the parser fixed by Christy. &lt;br /&gt;
&lt;br /&gt;
2016-11-18: Converted Executive Order PDFs to text files using adobe acrobat DC. See [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Wikipage] for details.&lt;br /&gt;
&lt;br /&gt;
2016-11-22: Transferred downloaded Morocco Written Bills to provided SeaGate Drive. Made a &amp;quot;gentle&amp;quot; F6S crawler to retrieve HTMLs of possible accelerator pages documented [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here].&lt;br /&gt;
&lt;br /&gt;
2016-11-29: Began pulling data from the accelerators listed [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) here]. Made text files for about 18 accelerators.&lt;br /&gt;
&lt;br /&gt;
2016-12-01: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project]. Built tool for the [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report E&amp;amp;I Governance Report Project] with Christy. Adds a column of data that shows whether or not the bill has been passed.&lt;br /&gt;
&lt;br /&gt;
2016-12-02: Built and ran web crawler for Center for Middle East Studies on Kuwait. Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-06: Learned how to use git. Committed software projects from the semester to the McNair git repository. Projects can be found at; [http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report Executive Order Crawler], [http://mcnair.bakerinstitute.org/wiki/Moroccan_Parliament_Web_Crawler Foreign Government Web Crawlers], [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) F6S Crawler and Parser].&lt;br /&gt;
&lt;br /&gt;
2016-12-07: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
2016-12-08: Continued making text files for the [http://mcnair.bakerinstitute.org/wiki/Accelerator_Seed_List_(Data) Accelerator Seed List project].&lt;br /&gt;
&lt;br /&gt;
'''Notes'''&lt;br /&gt;
&lt;br /&gt;
*Ed moved the Morocco Data to E:\McNair\Projects from C:\Users\PeterJ\Documents&lt;br /&gt;
* C Drive files moved to E:\McNair\Users\PeterJ&lt;br /&gt;
&lt;br /&gt;
[[Category:Work Log]]&lt;/div&gt;</summary>
		<author><name>Peterjalbert</name></author>
		
	</entry>
</feed>