Extract data¶
Preview with the Google Storage User Interface (optional)¶
Navigate to the epo-patentinformation bucket to have a preview of the dataset.
Warning
The total size of the bulk data exceeds 200Gb. Do not even think about downloading the full dataset using the UI. It will fail.
Download the dataset with gsutil
¶
To download the EPO bulk dataset using the console:
- Install
gsutil
, the google cloud Command Line Interface (CLI) to interact with Google Storage. Quickstart and Installation guide. - Download the dataset to
your/destination/folder
gsutil -u <your-billing-project> \ # specify the billing project -m cp -r gs://epo-patentinformation/ \ <your/destination/folder>
Tip
If you are a frequent user of the Google Cloud Platform, you can set your/destination/folder
to a Google Storage bucket uri (e.g. gs://...
).
The rest of the pipeline can be executed from a compute instance with the bucket mounted, see gcsfuse
instructions.