Changes

Jump to navigation Jump to search
no edit summary
---------------------------------------------------------------------------------------------
7/9 - Tried to understand the output of the matcher to understand the results from last week. After talking to Dylan and Connor, we decided to go through all of the matches from our data that were flagged as multiple matches. In a file called 'company name self matches', in the first sheet, orange highlights are a minor normalization difference, red highlights are most likely a duplicate, yellow highlights seem to be duplicates but I wasn't sure. I also inputted XXXX wherever there was a blank data field to prevent the matcher from shifting data weirdly into different columns.
 
7/10 - Talked with Connor and Dylan about building a master spreadsheet of all our company data. I created a table of companies, their UUIDs, and a count of the number of times the company appears in crunchbase. Then I was able to join this with a rough list of companies from our list that Connor gave me.
145

edits

Navigation menu