Changes

Crunchbase Accelerator Founders (view source)

Revision as of 17:27, 30 July 2018

25 bytes added , 17:27, 30 July 2018

no edit summary

Main function for the crawler which opens up a window and enters the known linkedin urls of each founder and puts the information into txt files. It then uses the search box to search for founder name + company and selects the href from the html and opens the profile on a new page. If a founder cannot be found(there are no search results), the founder name and company is put into unavailable_profiles.txt (might be called something else but should have "unavailable" in the name).

===linked_in_crawler.py===

Relies heavily on locating the xpath of an element of that we want on the webpage. I ran into a lot of difficulty with this because the xpath in the code no longer exists in the new linkedin html which includes dynamic ids. To get around this, I located different aspects that we could use to find an xpath to the element we were looking for and also located elements by their css.

===New Test Account===

Password: McNair2018

===Obstacles and Notes===

Use the selenium computer on Rice Visitor wifi.

After logging in a couple of times, LinkedIn will get suspicious and ask you to confirm that you are not a robot using reCaptcha. I got around this by delaying the program by 3 minutes so that I had time to complete the reCaptcha test. However, sometimes reCaptcha loses connection and it forces you to continue the tests which can be frustrating. When this happens, I disconnect and reconnect from the wifi as well as switch between the test accounts.

I never figured out how to stop reCaptcha from losing connection so I spent a lot of time completing reCaptcha tests.

GraceTan

108

edits

Changes

Crunchbase Accelerator Founders (view source)

Revision as of 17:27, 30 July 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools