Changes

Jump to navigation Jump to search
no edit summary
[http://www.chambredesrepresentants.ma/ar/%D9%85%D8%B1%D8%A7%D9%82%D8%A8%D8%A9-%D8%A7%D9%84%D8%B9%D9%85%D9%84-%D8%A7%D9%84%D8%AD%D9%83%D9%88%D9%85%D9%8A/%D8%A7%D9%84%D8%A3%D8%B3%D9%80%D8%A6%D9%84%D8%A9-%D8%A7%D9%84%D8%B4%D9%81%D9%88%D9%8A%D8%A9?field_ministeres_tid=All&field_groupe_concerne_target_id=All&field_parlementaires_associes_target_id=All&body_value=&field_transfere_ou_non_value=All Oral Questions]
The embedded pdfs on this site are not useful. The useful data elements are the dates and questions listed on the main site. Two solutions to To get this issue are taking screenshots of each page (faster implementation)data, and using I built a web crawler to retrieve that scraped the data (better data storage)date, question, and all relevant information about the question from the site.
===Moroccan Legislature Written Questions===
From this window, scrapy provides a helpful tool to test any web scraping lines of code you would like to try out.
Enter:
scrapy shell 'webaddress'
From here, you can type selector statements and print them to see if your statements are getting the data you desire.
 
To actually build your webcrawler, open a new python script, and save it in the spiders folder that was created automatically for you under your projectname folder.
Some example code for a spider is shown below; this was my spider for the oral questions portion of the Moroccan site.
 
 

Navigation menu