Changes

Jump to navigation Jump to search
375 bytes added ,  17:33, 18 July 2016
no edit summary
*Created conditionals for keys in JSON dictionaries. Successfully ran the tool on my 50 companies and then again on 1500 companies. Changed ratio to .75 and higher to elicit URLs that were close but not exact and got more results.<!-- flush -->
 
'''7/14'''
 
*Created a function, "about_us_url", that takes the url of a company obtained using the above function and identifies if the company has an "about" page.
 
*The function tests if the company url exists with either "about" or "about-us" as the sub-url. If it does, the new url is matched next to a old url in a new column, "about_us_url".
 
'''7/18'''
*Created a function , "company_description" that took takes a URL and gave back all of the substantial text blocks on the page (used to find company descriptions)
**Uses BeautifulSoup to access and explore HTML files.
**The function explores the HTML source code of the URL and finds all parts of the source code with the <p> tag to indicate a text paragraph.
**Then, the function goes though each paragraph, and if it is above a certain number of characters (eliminate for short, unnecessary information), the function adds the description in a new column of the csv file under "description".
383

edits

Navigation menu