Extract data

Preview with the Google Storage User Interface (optional)

Navigate to the epo-patentinformation bucket to have a preview of the dataset.

Warning

The total size of the bulk data exceeds 200Gb. Do not even think about downloading the full dataset using the UI. It will fail.

Download the dataset with gsutil

To download the EPO bulk dataset using the console:

  • Install gsutil, the google cloud Command Line Interface (CLI) to interact with Google Storage. Quickstart and Installation guide.
  • Download the dataset to your/destination/folder
gsutil  -u <your-billing-project> \ # specify the billing project
-m cp -r gs://epo-patentinformation/ \
<your/destination/folder>

Tip

If you are a frequent user of the Google Cloud Platform, you can set your/destination/folder to a Google Storage bucket uri (e.g. gs://...). The rest of the pipeline can be executed from a compute instance with the bucket mounted, see gcsfuse instructions.