Download
Zenodo¶
The dataset is available on Zenodo!
patCit Versioning
Versions of the dataset are archived on Zenodo as of v0.15
.
Dataset structure¶
The patCit dataset has the following structure:
1 2 3 4 5 6 7 8 9 |
|
Each .tar
file contains:
- compressed data file(s) in newline delimited JSON (
.jsonl.gz
) corresponding to the data table itself. When the table is large, it is chunked in multiple files. - the schema of the data table in JSON (
.json
).
Build a table locally
It is not possible to detail all the possible procedures due to the large diversity of database services. Instead, below are the general guidelines for any database service.
- Download the tar file(s) corresponding to the table(s) you are interested in
- Untar the file(s) (e.g.
tar -xvf <your-file.tar>
on mac/linux) - Unzip the data file(s) (e.g.
gunzip *.jsonl.gz
on mac/linux). This step is actually optional since some database services enable table building using zipped data files. - Build the table in your SQL like database service using the specified schema
BigQuery¶
In many cases, you don't need the whole dataset for your research. In order to avoid tedious filtering and post-processing on your local machine, we recommend that you adopt the following strategy:
- Query patCit using the BigQuery public release of the dataset. See our BigQuery exploration guide if you are new to BigQuery.
- Save the resulting table1. Here you go!
Example
Let's assume that you are interested in the ranking of journals by the number of articles cited by patents and published in the 1980s.
The related query is the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Run the query
Save the query
-
You can save small tables (less than 16,000 rows) to clipboard, locally or to Google sheets. You can save mid-size (less than 1Gb) tables to Google Drive. Larger tables have to be saved to BigQuery, then to Google Cloud Storage and from there you can download them locally. ↩