2015-02-16

Accessing the ESA Sentinel Mission Data with Python and OData

I've had a couple of enquiries now about how to access the OData feeds on the ESA Sentinel mission science data hub. Sentinel 1 is the first of a new group of satellites in the Copernicus programme to monitor the Earth. That's about all I know I'm afraid. This data is not pretty desktop pictures (though doubtless there are some pretty pictures buried in there somewhere) but raw scientific data from instruments currently orbiting the Earth.

The source code described here is available in the samples directory on GitHub, you must be using the latest Pyslet from master for this script to enable the metadata override technique used here.


The data hub advertises access to the data through OData (version 1) but my Python library, Pyslet, was not able to access the feeds properly: hence the enquiries.

Turns out that the data feeds use a concept called containment in OData. The model of OData is one of entity sets (think SQL tables) with relations between them modelled by navigation properties. There's one particular use case that doesn't work very well in this scenario but seems popular. Given an entity (think table row or record) people want to add arbitrary key-value pairs. The ESA's data model does this by creating 'sub-tables' which define collections of attributes that hang off of each entity. The attribute name is the key in these collections. This doesn't really work in OData v1 (or v2) because these attribute values should still be entities in their own right and therefore they need a unique key and an entity set definition to contain them.

This isn't the only schema I've seen that attempts to do something like this either, SAP have published a similar schema suggesting that some early Java tools exposed OData this way.

The upshot is that you get nasty errors when you try and load these services with Pyslet. It complains of a rather obscure condition concerning (possibly multiple) unbound principals. When I wrote that error message, I didn't expect anyone to ever actually see it.

There's a proper way to do containment in earlier versions of OData, described in Containment is Coming with OData v4 which explains how to use composite keys. As the name of the article suggests though, this is written with hindsight after a better solution has been found for this use case in OData v4.

The fix for the ESA data feed is to download and edit a copy of the advertised metadata to get around the errors reported by Pyslet and then to initialise your OData client using this modified schema instead. It isn't a perfect fix, as far as Pyslet knows those attributes really are unique and do reside in their own entity set but it doesn't really matter for the purposes of using the OData client. You can navigate and formulate queries without tripping over data inconsistencies.

I've written a little script that I've added to Pyslet's sample code directory to illustrate the technique, along with a fixed up metadata file. The result is a little UNIX-style utility for downloading products from the ESA data hub:

$ ./download.py --help
Usage: download.py [options]

Options:
  -h, --help            show this help message and exit
  -u USER, --user=USER  user name for basic auth credentials
  -p PASSWORD, --password=PASSWORD
                        password for basic auth credentials
  -v                    increase verbosity of output up to 3x
  -c, --cert            download and trust site certificate

The data is available via https and requires a user name and password (you'll have to register on the data hub site but it's free to do so). To make it easier to set up the trust aspect I've added a -c option to download the site certificate and store it. If you don't have the site certificate you'll get an error like this:

ERROR:root:scihub.esa.int: closing connection after error failed to build secure connection to scihub.esa.int

Subsequent downloads verify that the site certificate hasn't changed: a bit like the way ssh offers to store a fingerprint the first time you connect to a remote host. Only use the -c option if you trust the network you are running on (you can use Firefox or some other 'trusted' browser to download the certificate too of course).

The password is optional, if you don't provide it you'll be prompted to enter it using Python's getpass function for privacy.

You pass the product identifiers as command line arguments, here is an example of a successful first-time run:

$ ./download.py -c -u swl10 8bf64ff9-f310-4027-b31f-8e95dd9bbf82
Password: 
ERROR:root:Entity set Attributes has more than one unbound principal
dropping mutliplicity of Attribute_Node to 0..1.  Continuing
ERROR:root:Entity set Attributes has more than one unbound principal
dropping mutliplicity of Attribute_Product to 0..1.  Continuing
S1A_EW_GRDM_1SDH_20150207T084156_20150207T084218_004515_0058AE_3051 150751068

After running this command I had a scihub.esa.int.crt file (from the -c option) and a 150MB zip file downloaded to the current directory.

If you run with -vv to provide a bit more information you can see the OData magic in operation:

./download.py -vv -u swl10 8bf64ff9-f310-4027-b31f-8e95dd9bbf82
Password: 
INFO:root:Sending request to scihub.esa.int
INFO:root:GET /dhus/odata/v1/ HTTP/1.1
INFO:root:Connected to scihub.esa.int with DHE-RSA-AES256-SHA, TLSv1/SSLv3, key length 256
INFO:root:Finished Response, status 401
INFO:root:Resending request to: https://scihub.esa.int/dhus/odata/v1/
INFO:root:Sending request to scihub.esa.int
INFO:root:GET /dhus/odata/v1/ HTTP/1.1
INFO:root:Connected to scihub.esa.int with DHE-RSA-AES256-SHA, TLSv1/SSLv3, key length 256
INFO:root:Finished Response, status 200
WARNING:root:Entity set Attributes has an unbound principal: Nodes
WARNING:root:Entity set Attributes has an unbound principal: Products
ERROR:root:Entity set Attributes has more than one unbound principal
dropping multiplicity of Attribute_Node to 0..1.  Continuing
ERROR:root:Entity set Attributes has more than one unbound principal
dropping multiplicity of Attribute_Product to 0..1.  Continuing
INFO:root:Sending request to scihub.esa.int
INFO:root:GET /dhus/odata/v1/Products('8bf64ff9-f310-4027-b31f-8e95dd9bbf82') HTTP/1.1
INFO:root:Connected to scihub.esa.int with DHE-RSA-AES256-SHA, TLSv1/SSLv3, key length 256
INFO:root:Finished Response, status 200
S1A_EW_GRDM_1SDH_20150207T084156_20150207T084218_004515_0058AE_3051 150751068
INFO:root:Sending request to scihub.esa.int
INFO:root:GET /dhus/odata/v1/Products('8bf64ff9-f310-4027-b31f-8e95dd9bbf82')/$value HTTP/1.1
INFO:root:Connected to scihub.esa.int with DHE-RSA-AES256-SHA, TLSv1/SSLv3, key length 256
INFO:root:Finished Response, status 200

As you can see, the fixed up metadata still generates error messages but these are no longer critical and the client is able to interact with the service.

I was given this product identifier as an example of something small to test with. I haven't researched what the data actually represents but the resulting zip file does contain a 'quick_look' image:

No comments:

Post a Comment