Changes

Jump to navigation Jump to search
*implement generate_dataset.py and sitmap tool
**regenerate dataset using updated data and tool
 
'''5/16/2019'''
*implementation on CNN
*Some problems to consider:
**some websites have more than 1 cohort page: a list of cohorts for each year
**class label is highly imbalanced:
https://towardsdatascience.com/deep-learning-unbalanced-training-data-solve-it-like-this-6c528e9efea6
 
'''5/17/2019'''
*have to go back with the old plan of separating image data :(
*documentation on wiki
*possibly run python on GPU server
227

edits

Navigation menu