2. Usage¶
2.1. General help¶
In order to get help for any cernopendata-client
command, use the
--help
option:
$ cernopendata-client --help
Usage: cernopendata-client [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
download-files Download data files belonging to a record.
get-file-locations Get a list of data file locations of a record.
get-metadata Get metadata content of a record.
list-directory List contents of a EOSPUBLIC Open Data directory.
verify-files Verify downloaded data file integrity.
version Return cernopendata-client version.
2.2. Selecting records¶
The data published on the CERN Open Data portal are organised in bibliographic
records. Each record is uniquely identified by a numerical record
ID, for example record 1. Moreover, some records are
minted with a Digital Object Identifier (DOI), for example
10.7483/OPENDATA.CMS.A342.9982. Each of these
identifiers can be used in various cernopendata-client
commands to
select record one is interested at. For example:
$ cernopendata-client <command> --recid 1
$ cernopendata-client <command> --doi 10.7483/OPENDATA.CMS.A342.9982
Various available commands are shown below.
2.3. Getting metadata¶
In order to get metadata information about a record, please use the get-metadata command:
$ cernopendata-client get-metadata --recid 1
{
"$schema": "http://opendata.cern.ch/schema/records/record-v1.0.0.json",
"abstract": {
"description": "<p>BTau primary dataset in AOD format from RunB of 2010</p> <p>This dataset contains all runs from 2010 RunB. The list of validated runs, which must be applied to all analyses, can be found in</p>",
"links": [
{
"recid": "1000"
}
]
},
"accelerator": "CERN-LHC",
"collaboration": {
"name": "CMS collaboration",
"recid": "450"
},
...
This will output a JSON containing all the record metadata, such as title, authors, keywords, collision energy, etc. The JSON may also contain interesting physics information describing the dataset.
If you would like to extract parts of metadata, for example to extract only the dataset title, or only the Global Tag information for CMS datasets, you can use –output-value command-line option:
$ cernopendata-client get-metadata --recid 1 --output-value title
/BTau/Run2010B-Apr21ReReco-v1/AOD
$ cernopendata-client get-metadata --recid 1 --output-value system_details.global_tag
FT_R_42_V10A::All
If the output field produces a list of values, you may want to filter only certain field values of interest. For example, you may want to ask not to output all authors of a dataset record, but only the authors matching certain particular author name. You can use the –filter command-line option to achieve this:
$ cernopendata-client get-metadata --recid 329 --output-value authors.name
Adam-Bourdarios, Claire
Cowan, Glen
Germain, Cecile
Guyon, Isabelle
Kégl, Balázs
Rousseau, David
$ cernopendata-client get-metadata --recid 329 --output-value authors.name --filter affiliation='Orsay, LAL; Paris, IN2P3; Orsay' --filter orcid='0000-0001-7613-8063'
Rousseau, David
2.4. Listing available data files¶
In order to get a list of data files belonging to a record, please use the get-file-locations command:
HTTP protocol
$ cernopendata-client get-file-locations --recid 5500
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/HiggsDemoAnalyzer.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/List_indexfile.txt
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall_lvl3.cc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3MC.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3data.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4MC.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4data.py
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.pdf
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.png
This command will output URIs for all the files associated with the record ID
5500, using the HTTP protocol. Note that you can specify --server
https://opendata.cern.ch
if you would like to use the HTTPS protocol instead.
XRootD protocol
Note that you can use --protocol xrootd
command-line option if you
would rather see the equivalent XRootD endpoints for the files:
$ cernopendata-client get-file-locations --recid 5500 --protocol xrootd
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/HiggsDemoAnalyzer.cc
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/List_indexfile.txt
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall.cc
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall_lvl3.cc
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3MC.py
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3data.py
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4MC.py
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4data.py
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.pdf
root://eospublic.cern.ch//eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.png
The data files can be downloaded via XRootD protocol using the xrdcp command.
File sizes and checksums
If you would like to know in advance the file sizes and checksums, you can use
--verbose
option:
$ cernopendata-client get-file-locations --recid 5500 --verbose
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/BuildFile.xml 305 adler32:ff63668a
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/HiggsDemoAnalyzer.cc 83761 adler32:f205f068
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/List_indexfile.txt 1669 adler32:46a907fc
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall.cc 14943 adler32:af301992
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/M4Lnormdatall_lvl3.cc 15805 adler32:9d9b2126
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3MC.py 3741 adler32:cc943381
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level3data.py 3689 adler32:1d3e2a43
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4MC.py 3874 adler32:9cbd53a3
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/demoanalyzer_cfg_level4data.py 3821 adler32:177b49c0
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.pdf 18170 adler32:19c6a6a2
http://opendata.cern.ch/eos/opendata/cms/software/HiggsExample20112012/mass4l_combine.png 93152 adler32:62e0c299
2.5. Downloading data files¶
In order to download data files belonging to a record, please use the download-files command. The command can download files over HTTP, HTTPS or XRootD protocols and verify the file checksums.
HTTP protocol
By default the download-files command uses HTTP protocol:
$ cernopendata-client download-files --recid 5500
==> Downloading file 1 of 11
-> File: ./5500/BuildFile.xml
-> Progress: 0/0 KiB (100%)
==> Downloading file 2 of 11
-> File: ./5500/HiggsDemoAnalyzer.cc
-> Progress: 81/81 KiB (100%)
==> Downloading file 3 of 11
-> File: ./5500/List_indexfile.txt
-> Progress: 1/1 KiB (100%)
==> Downloading file 4 of 11
-> File: ./5500/M4Lnormdatall.cc
-> Progress: 14/14 KiB (100%)
==> Downloading file 5 of 11
-> File: ./5500/M4Lnormdatall_lvl3.cc
-> Progress: 15/15 KiB (100%)
==> Downloading file 6 of 11
-> File: ./5500/demoanalyzer_cfg_level3MC.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 7 of 11
-> File: ./5500/demoanalyzer_cfg_level3data.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 8 of 11
-> File: ./5500/demoanalyzer_cfg_level4MC.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 9 of 11
-> File: ./5500/demoanalyzer_cfg_level4data.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 10 of 11
-> File: ./5500/mass4l_combine.pdf
-> Progress: 17/17 KiB (100%)
==> Downloading file 11 of 11
-> File: ./5500/mass4l_combine.png
-> Progress: 90/90 KiB (100%)
==> Success!
The command will download files into a local directory called 5500
after
the record ID input parameter.
By default the download will be carried out over HTTP protocol. If you would
like to use the HTTPS protocol instead , please specify --server
https://opendata.cern.ch
.
Note that you can also download files from another server, for example from our
Quality Assurance server, by using --server http://opendata-qa.cern.ch
.
XRootD protocol
If you have installed client with XRootD support, you can use --protocol
xrootd
command-line option to use that protocol instead of HTTP/HTTPS:
$ cernopendata-client download-files --recid 5500 --protocol xrootd
==> Downloading file 1 of 11
-> File: ./5500/BuildFile.xml
==> Downloading file 2 of 11
-> File: ./5500/HiggsDemoAnalyzer.cc
==> Downloading file 3 of 11
-> File: ./5500/List_indexfile.txt
==> Downloading file 4 of 11
-> File: ./5500/M4Lnormdatall.cc
==> Downloading file 5 of 11
-> File: ./5500/M4Lnormdatall_lvl3.cc
==> Downloading file 6 of 11
-> File: ./5500/demoanalyzer_cfg_level3MC.py
==> Downloading file 7 of 11
-> File: ./5500/demoanalyzer_cfg_level3data.py
==> Downloading file 8 of 11
-> File: ./5500/demoanalyzer_cfg_level4MC.py
==> Downloading file 9 of 11
-> File: ./5500/demoanalyzer_cfg_level4data.py
==> Downloading file 10 of 11
-> File: ./5500/mass4l_combine.pdf
==> Downloading file 11 of 11
-> File: ./5500/mass4l_combine.png
==> Success!
Select download engine
You can specify the download engine with --download-engine
option.
requests
andpycurl
are two supported download engines for HTTP protocol.xrootd
is the only supported download engine for XRootD protocol.
$ cernopendata-client download-files --recid 5500 --filter-name BuildFile.xml --download-engine pycurl
==> Downloading file 1 of 1
-> File: ./5500/BuildFile.xml
-> Progress: 0/0 KiB (100%)
==> Success!
Filter by name
A dataset may consist of thousands of files. You can use powerful filtering options to download only certain files matching your criteria.
For example, you can download only files matching exactly a given file name using the --filter-name
option:
$ cernopendata-client download-files --recid 5500 --filter-name BuildFile.xml
==> Downloading file 1 of 1
-> File: ./5500/BuildFile.xml
-> Progress: 0/0 KiB (100%)
==> Success!
$ cernopendata-client download-files --recid 5500 --filter-name BuildFile.xml,List_indexfile.txt
==> Downloading file 1 of 2
-> File: ./5500/BuildFile.xml
-> Progress: 0/0 KiB (100%)
==> Downloading file 2 of 2
-> File: ./5500/List_indexfile.txt
-> Progress: 1/1 KiB (100%)
==> Success!
Filter by regular expression
You can download all files matching a certain regular expression using the --filter-regexp
option:
$ cernopendata-client download-files --recid 5500 --filter-regexp py$
==> Downloading file 1 of 4
-> File: ./5500/demoanalyzer_cfg_level3MC.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 2 of 4
-> File: ./5500/demoanalyzer_cfg_level3data.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 3 of 4
-> File: ./5500/demoanalyzer_cfg_level4MC.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 4 of 4
-> File: ./5500/demoanalyzer_cfg_level4data.py
-> Progress: 3/3 KiB (100%)
==> Success!
Filter by range
You can also download files from a specified range (i-j) using the --filter-range
option:
$ cernopendata-client download-files --recid 5500 --filter-range 1-4
==> Downloading file 1 of 4
-> File: ./5500/BuildFile.xml
-> Progress: 0/0 KiB (100%)
==> Downloading file 2 of 4
-> File: ./5500/HiggsDemoAnalyzer.cc
-> Progress: 81/81 KiB (100%)
==> Downloading file 3 of 4
-> File: ./5500/List_indexfile.txt
-> Progress: 1/1 KiB (100%)
==> Downloading file 4 of 4
-> File: ./5500/M4Lnormdatall.cc
-> Progress: 14/14 KiB (100%)
==> Success!
$ cernopendata-client download-files --recid 5500 --filter-range 1-2,5-7
==> Downloading file 1 of 5
-> File: ./5500/BuildFile.xml
==> Downloading file 2 of 5
-> File: ./5500/HiggsDemoAnalyzer.cc
==> Downloading file 3 of 5
-> File: ./5500/M4Lnormdatall_lvl3.cc
==> Downloading file 4 of 5
-> File: ./5500/demoanalyzer_cfg_level3MC.py
==> Downloading file 5 of 5
-> File: ./5500/demoanalyzer_cfg_level3data.py
==> Success!
Filter by combining multiple selectors
You can combine multiple filters in the same download command. Here are several examples:
$ cernopendata-client download-files --recid 5500 --filter-regexp py --filter-range 1-2
==> Downloading file 1 of 2
-> File: ./5500/demoanalyzer_cfg_level3MC.py
-> Progress: 3/3 KiB (100%)
==> Downloading file 2 of 2
-> File: ./5500/demoanalyzer_cfg_level3data.py
-> Progress: 3/3 KiB (100%)
==> Success!
$ cernopendata-client download-files --recid 5500 --filter-regexp py --filter-range 1-2,4-4
==> Downloading file 1 of 3
-> File: ./5500/demoanalyzer_cfg_level3MC.py
==> Downloading file 2 of 3
-> File: ./5500/demoanalyzer_cfg_level3data.py
==> Downloading file 3 of 3
-> File: ./5500/demoanalyzer_cfg_level4data.py
==> Success!
2.6. Verifying files¶
If you have downloaded the data files for a certain record, and you would like to verify their integrity and check whether there haven’t been some critical updates on the CERN Open Data portal side, you can use the verify-files command:
$ cernopendata-client verify-files --recid 5500
==> Verifying number of files for record 5500...
-> Expected 11, found 11
==> Verifying file BuildFile.xml...
-> Expected size 305, found 305
-> Expected checksum adler32:ff63668a, found adler32:ff63668a
==> Verifying file HiggsDemoAnalyzer.cc...
-> Expected size 83761, found 83761
-> Expected checksum adler32:f205f068, found adler32:f205f068
==> Verifying file List_indexfile.txt...
-> Expected size 1669, found 1669
-> Expected checksum adler32:46a907fc, found adler32:46a907fc
==> Verifying file M4Lnormdatall.cc...
-> Expected size 14943, found 14943
-> Expected checksum adler32:af301992, found adler32:af301992
==> Verifying file M4Lnormdatall_lvl3.cc...
-> Expected size 15805, found 15805
-> Expected checksum adler32:9d9b2126, found adler32:9d9b2126
==> Verifying file demoanalyzer_cfg_level3MC.py...
-> Expected size 3741, found 3741
-> Expected checksum adler32:cc943381, found adler32:cc943381
==> Verifying file demoanalyzer_cfg_level3data.py...
-> Expected size 3689, found 3689
-> Expected checksum adler32:1d3e2a43, found adler32:1d3e2a43
==> Verifying file demoanalyzer_cfg_level4MC.py...
-> Expected size 3874, found 3874
-> Expected checksum adler32:9cbd53a3, found adler32:9cbd53a3
==> Verifying file demoanalyzer_cfg_level4data.py...
-> Expected size 3821, found 3821
-> Expected checksum adler32:177b49c0, found adler32:177b49c0
==> Verifying file mass4l_combine.pdf...
-> Expected size 18170, found 18170
-> Expected checksum adler32:19c6a6a2, found adler32:19c6a6a2
==> Verifying file mass4l_combine.png...
-> Expected size 93152, found 93152
-> Expected checksum adler32:62e0c299, found adler32:62e0c299
==> Success!
Note that you can verify each file “just in time” as it is being downloaded as well:
$ cernopendata-client download-files --recid 5500 --filter-range 1-4 --verify
==> Downloading file 1 of 4
-> File: ./5500/BuildFile.xml
==> Verifying file BuildFile.xml...
-> Expected size 305, found 305
-> Expected checksum adler32:ff63668a, found adler32:ff63668a
==> Downloading file 2 of 4
-> File: ./5500/HiggsDemoAnalyzer.cc
==> Verifying file HiggsDemoAnalyzer.cc...
-> Expected size 83761, found 83761
-> Expected checksum adler32:f205f068, found adler32:f205f068
==> Downloading file 3 of 4
-> File: ./5500/List_indexfile.txt
==> Verifying file List_indexfile.txt...
-> Expected size 1669, found 1669
-> Expected checksum adler32:46a907fc, found adler32:46a907fc
==> Downloading file 4 of 4
-> File: ./5500/M4Lnormdatall.cc
==> Verifying file M4Lnormdatall.cc...
-> Expected size 14943, found 14943
-> Expected checksum adler32:af301992, found adler32:af301992
==> Success!
2.7. Listing directories¶
The CERN Open Data files are hosted on the EOSPUBLIC data storage service. In order to get a list of files belonging to a certain EOSPUBLIC directory, please use the list-directory command:
$ cernopendata-client list-directory /eos/opendata/cms/validated-runs/Commissioning10
Commissioning10-May19ReReco_7TeV.json
Commissioning10-May19ReReco_900GeV.json
The list-directory command uses XRootD protocol to list data files and hence it is available only when you install the XRootD flavour. Please see the Installation documentation for more details.
Iterate recursively
Note that you can use --recursive
command-line option if you would like to
iterate also through all the subdirectories of the given directory:
$ cernopendata-client list-directory /eos/opendata/cms/validated-runs --recursive
Commissioning10-May19ReReco_7TeV.json
Commissioning10-May19ReReco_900GeV.json
Cert_190456-208686_8TeV_22Jan2013ReReco_Collisions12_JSON.txt
Cert_160404-180252_7TeV_ReRecoNov08_Collisions11_JSON.txt
Cert_136033-149442_7TeV_Apr21ReReco_Collisions10_JSON_v2.txt
Iterate recursively with timeout
If you would like to list a directory that contains a large amount of files,
you can specify --timeout
option in order to exit after a certain amount of
time. The default timeout is 60 seconds.
$ cernopendata-client list-directory /eos/opendata/cms/Run2010B/BTau/AOD --recursive --timeout 30
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0000_file_index.json
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0000_file_index.txt
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0001_file_index.json
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0001_file_index.txt
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0002_file_index.json
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0002_file_index.txt
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0003_file_index.json
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0003_file_index.txt
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0004_file_index.json
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0004_file_index.txt
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0005_file_index.json
CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0005_file_index.txt
..
2.8. More information¶
For more information about all the available cernopendata-client
commands
and options, please see CLI API.