Return to [[Patent Data (Wiki Page)]].<section begin=dataverse />The Harvard Dataverse provides clean versions of the U.S. utility patent datasets spanning 1975-2010. The data is post author disambiguation. <section end=dataverse /> For details, see the paper at [https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/15705 Harvard Dataverse]. This page records how to load/and use the Harvard Dataverse. The patents from 1975-2010 loaded as .sqlite3 and csv files can be found at[https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/15705 Harvard Dataverse]. All of the files have been downloaded to the database serverr and can be found at cd/bulk/patent.
==Getting the data==
For more information about the patent data:
[[Patent Data(Wiki Page)]] To recreate the tables: 1. run createtables.sql 2. run copytables.sql 3. run cleaning db.sql These scripts are available under /bulk/Software/Database\ Scripts
==Loading the tables into the database==
B2 | 1119954
(4 rows)
SELECT p.patent, COUNT(c.patent) AS numcited FROM patents AS p, CITATIONS AS c where p.patent=c.citation GROUP BY p.patent ORDER BY numcited DESC;
patent | numcited
---------+--------------
4683202 | 1992
4723129 | 1935
4683195 | 1814
4463359 | 1676
4740796 | 1638
4558333 | 1558
4345262 | 1537
4313124 | 1504
4459600 | 1461
4733665 | 1286
5103459 | 1118
5572643 | 1018
4901307 | 959
5143854 | 924
5523520 | 918
5643826 | 883
4655771 | 861
4340563 | 808
5742905 | 792
5892900 | 780
4799156 | 771
4816567 | 751
5172338 | 749
4886062 | 745
4776337 | 744
4965188 | 742
4800882 | 726
4580568 | 706
5710887 | 697
4665906 | 697
3953566 | 681
5056109 | 678
4405829 | 666
4739762 | 665
4503569 | 662
4608577 | 659
4179337 | 642
5794207 | 636
5064435 | 635
5530852 | 630
5272236 | 629
4100324 | 625
5708780 | 623
5608786 | 620
5109390 | 618
5715314 | 616
5923962 | 610
5101501 | 608
5774660 | 607
4908112 | 607
[[Category:Internal]]
[[Internal Classification::Legacy| ]]