Changes

Jump to navigation Jump to search
no edit summary
Ran FDA Trial data ripping again, as the text output files were wiped. Plan on discussing with Julia and Meghana again about pulling universities and other relevant institutions from the Assignee List USA. Talked to Sonia about pulling city, state, zipcode information, hence python was installed in a database. Will work with Sonia on Wednesday afternoon and see how best a regex function could be implemented
===='''3/8/2017 WEDNESDAY 9AM-12PM'''====* Output sql tables from finished run of Jeemin_FDATrial_as_key_data_ripping.py * Ran through assigneelist_USA.txt to see how many different ways UNIVERSITY could be spelled wrong. There were many.* Tried to logic through creating a pattern that could catch all different versions of UNIVERSITY. Discuss further on whether UNIVERSITIES and those that include UNIVERSITIES but include INC in the end should be pulled as relevant information
===='''3/8/2017 WEDNESDAY 2PM-5PM '''====* Wrote regex pattern that identifies all "university" matchings - can be found in E:\McNair\Projects\University Patents\university_pulled_from_assignee_list_USA -- is an output file* Talked to Sonia, but didn't come to solid conclusion on identifying whether key words associate with city or country by running a python function
===='''3/13/2017 MONDAY 12PM-2PM'''====* For University Patent Data Matching - matched SCHOOL (output: E:\McNair\Projects\University Patents\school_pulled_from_assignee_list_USA) and matched INSTITUTE(output: E:\McNair\Projects\University Patents\institute_pulled_from_assignee_list_USA). * [[University Patent Matching]] . * To be worked on later: Grant XML parsing & general name matcher
===='''3/14/2017 TUESDAY 12PM-2PM '''====
* Started pulling academy cases but there are too many cases to worry about, in terms of institution of interest. A document is located in E:\McNair\Projects\University Patents\academies_verify_cases.txt
* Need Julia/Meghana to look through the hits and see which are relevant & extract pattern from there.
* Having trouble outputting txt file without double quotes around every line.
* Thinking that one text file should be output for all keywords instead of having one each, to avoid overlap (ex) COLLEGE and UNIVERSITY are both keywords; ALBERT EINSTEIN COLLEGE OF YESHIVA UNIVERSITY will be hit twice if it were counted as two separate instances, one accounting for COLLEGE and the other for UNIVERSITY) - either in the form of if-elseif statements or one big regex check.
337

edits

Navigation menu