Load¶
BigQuery offers a convenient way to query and analyze large amounts of data.
Info
In case you are new to BigQuery, you might want to:
- Take the Google BigQuery Quickstart
- Learn more on
bq load
Data schema¶
To load a table on BigQuery, you need to specify its schema. CreateSchema.py
(python CLI) generates this schema for you.
Take care to set the --prepare-names
/ --no-prepare-names
option to the value set when you serialized the data.
python bin/create-schema.py \ --prepare-names \ path/to/schema.json # destination file
Load table¶
Tip
For the sake of efficiency, load the serialized files to a Google Storage bucket beforehand
gsutil -m cp -r path/to/folder/ gs://your-bucket/
bq load --source_format=NEWLINE_DELIMITED_JSON \ --ignore_unknown_values \ --replace \ --max_bad_records=10 \ project:dataset.table \ path/to/EP*.jsonl \ # gs://your-bucket/EP*.jsonl recommended path/to/schema.json