2014-11-14

Basic Authentication, SSL and Pyslet's HTTP/OData client

Pyslet is my Python package for Standards in Learning Education and Training and represents a packaging up of the core of my QTI migration script code in a form that makes it easier for other developers to use. Earlier this year I released Pyslet to PyPi and moved development to Github to make it easier for people to download, install and engage with the source code.

Note: this article updated 2017-05-24 with code correction (see comments for details).

Warning: The code in this article will work with the latest Pyslet master from Github, and with any distribution on or later than pyslet-0.5.20141113. At the time of writing the version on PyPi has not been updated!

A recent issue that came up concerns Pyslet's HTTP client. The client is the base class for Pyslet's OData client. In my own work I often use this client to access OData feeds protected with HTTP's basic authentication but I've never properly documented how to do it. There are two approaches...

The simplest way, and the way I used to do it, is to override the client object itself and add the Authorization header at the point where each request is queued.

from pyslet.http.client import Client

class MyAuthenticatedClient(Client):

    # add an __init__ method to set some credentials 
    # in the client

    def queue_request(self, request):
        # add in the authorization credentials
        if (self.credentials is not None and
                not request.has_header("Authorization")):
            request.set_header('Authorization',
                               str(self.credentials))
            super(MyAuthenticatedClient, self).queue_request(request)

This works OK but it forces the issue a bit and will result in the credentials being sent to all URLs, which you may not want. The credentials object should be an instance of pyslet.http.auth.BasicCredentials which takes care of correctly formatting the header. Here is some sample code to create that object:

from pyslet.http.auth import BasicCredentials
from pyslet.rfc2396 import URI

credentials = BasicCredentials()
credentials.userid = "user@example.com"
credentials.password = "secretPa$$word"
credentials.protectionSpace = URI.from_octets(
    'https://www.example.com/mypage').get_canonical_root()

With the above code, str(credentials) returns the string: 'Basic dXNlckBleGFtcGxlLmNvbTpzZWNyZXRQYSQkd29yZA==' which is what you'd expect to pass in the Authorization header.

To make this code play more nicely with the HTTP standard I added some core-support to the HTTP client itself, so you don't need to override the class anymore. The HTTP client now has a credential store and an add_credentials method. Once added, the following happens when a 401 response is received:

  1. The client iterates through any received challenges
  2. Each challenge is matched against the stored credentials
  3. If matching credentials are found then an Authorization header is added and the request resent
  4. If the request receives another 401 response the credentials are removed from the store and we go back to (1)

This process terminates when there are no more credentials that match any of the challenges or when a code other than 401 is received.

If the matching credentials are BasicCredentials (and that's the only type Pyslet supports out of the box!), then some additional logic gets activated on success. RFC 2617 says that for basic authentication, a challenge implies that all paths "at or deeper than the depth of the last symbolic element in the path field" fall into the same protection space. Therefore, when credentials are used successfully, Pyslet adds the path to the credentials using BasicCredentials.add_success_path. Next time a request is sent to a URL on the same server with a path that meets this criterium the Authorization header will be added pre-emptively.

If you want to pre-empt the 401 handling completely then you just need to add a suitable path to the credentials before you add them to the client. So if you know your credentials are good for everything in /website/~user/ you could continue the above code like this:

credentials.add_success_path('/website/~user/')

That last slash is really important, if you leave it off it will add everything in '/website/' to your protection space which is probably not what you want.

SSL

If you're going to pass basic auth credentials around you really should be using https. Python makes it a bit tricky to use HTTPs and be sure that you are using a trusted connection. Pyslet tries to make this a little bit easier. Here's what I do.

  1. With Firefox, go to the site in question and check that SSL is working properly
  2. Export the certificate from the site in PEM format and save to disk, e.g., www.example.com.crt
  3. Repeat for any other sites I want my python script to work with.
  4. Concatenate the files together and save them to, say, 'certificates.pem'
  5. Pass this file name to the HTTP (or OData) client constructor.
from pyslet.http.client import Client

my_client = Client(ca_certs='certificates.pem')
my_client.add_credentials(credentials)

In this code, I've assumed that the credentials were created as above. To be really sure you are secure here, try grabbing a file from a different site or, even better, generate a self-signed certificate and use that instead. (The master version of Pyslet currently has such a certificate ready made in unittests/data_rfc2616/server.crt). Now pass that file for ca_certs and check that you get SSL errors! If you don't, something is broken and you should proceed with caution, or you may just be on a Mac (see notes in Is Python's SSL module correctly validating certificates... for details). And don't pass None for ca_certs as that tells the ssl module not to check at all!

If you don't like messing around with the certificates, and you are using a machine and network that is pretty trustworthy and from which you would happily do your internet banking then the following can be used to proxy for the browser method:

import ssl, string
import pyslet.rfc2396 as uri

certs = []
for s in ('https://www.example.com', 'https://www.example2.com', ):
    # add other sites to the above tuple as you like
    url = uri.URI.from_octets(s)
    certs.append(ssl.get_server_certificate(url.get_addr(),
                 ssl_version=ssl.PROTOCOL_TLSv1))
    with open('certificates.pem', 'wb') as f:
        f.write(string.join(certs,''))

Passing the ssl_version is optional above but the default setting in many Python installations will use the discredited SSLv3 or worse and your server may refuse to serve you, I know mine does! Set it to a protocol you trust.

Remember that you'll have to do this every so often because server certificates expire. You can always grab the certificate authority's certificate instead (and thereby trust a whole slew of sites at once) but if you're going that far then there are better recipes for finding and re-using the built-in machine certificate store anyway. The beauty of this method is that you can self-sign a server certificate you trust and connect to it securely with a Python client without having to mess around with certificate authorities at all, provided you can safely courier the certificate from your server to your client that is! If you are one of the growing number of people who think the whole trust thing is broken anyway since Snowden then this may be an attractive option.

With thanks to @bolhovsky on Github for bringing the need for this article to my attention.