2014-05-26

Adding OData support to Django with Pyslet: First Thoughts

A couple of weeks ago I got an interesting tweet from @d34dl0ck, here it is:

This got me thinking, but as I know very little about Django I had to do a bit of research first. Here's my read-back of what Django's data layer does in the form of a concept mapping from OData to Django. In this table the objects are listed in containment order and the use case of using OData to expose data managed by a Django-based website is assumed. (See below for thoughts on consuming OData in Django as if it were a data source.)

OData ConceptDjango ConceptPyslet Concept
DataServices Django website: the purpose of OData is to provide access to your application's data-layer through a standard API for machine-to-machine communication rather than through an HTML-based web view for human consumption. Instance of the DataServices class, typically parsed from a metadata XML file.
Schema No direct equivalent. In OData, the purpose of the schema is to provide a namespace in which definitions of the other elements take place. In Django this information will be spread around your Python source code in the form of class definitions that support the remaining concepts. Instance of the Schema class, typically parsed from a metadata XML file.
EntityContainer The database. An OData service can define multiple containers but there is always a default container - something that corresponds closely with the way Django links to multiple databases. Most OData services probably only define a single container and I would expect that most Django applications use the default database. If you do define custom database routers to map different models to different databases then that information would need to be represented in the corresponding Schema(s). In Pyslet, an EntityContainer is defined by an instance of the EntityContainer class but this instance is handed to a storage layer during application startup and this storage layer class binds concrete implementations of the data access API to the EntitySets it contains.
EntitySet Your model class. A model class maps to a table in the Django database. In OData the metadata file contains the information about which container contains an EntitySet and the EntityType definition in that file contains the actual definitions of the types and field names. In contrast, in Django these are defined using class attributes in the Python code. Pyslet sticks closely to the OData API here and parses definitions from the metadata file. As a result an EntitySet instance is created that represents this part of the model and it is up to the object responsible for interfacing to the storage layer to provide concrete bindings.
Entity An instance of a model class. An instance of the Entity object, typically instantiated by the storage object bound to the EntitySet.

Where do you start?

Step 1: As you can see from the above table, Pyslet depends fairly heavily on the metadata file so a good way to start would be to create a metadata file that corresponds to the parts of your Django data model you want to expose. You have some freedom here but if you are messing about with multiple databases in Django it makes sense to organise these as separate entity containers. You can't create relationships across containers in Pyslet which mirrors the equivalent restriction in Django.

Step 2: You now need to provide a storage object that maps Pyslet's DAL onto the Django DAL. This involves creating a sub-class of the EntityCollection object from Pyslet. To get a feel for the API my suggestion would be to create a class for a specific model initially and then, once this is working, consider how you might use Python's built-in introspection to write a more general object.

To start with, you don't need to do too much. EntityCollection objects are just like dictionaries but you only need to override itervalues and __getitem__ to get some sort of implementation going. There are simple wrappers that will (inefficiently) handle ordering and filtering for you to start with so itervalues can be very simple...

def itervalues(self):
    return self.OrderEntities(
        self.ExpandEntities(
        self.FilterEntities(
        self.entityGenerator())))

All you need to do is write the entityGenerator method (the name is up to you) and yield Entity instances from your Django model. This looks pretty simple in Django, something like Customer.objects.all() where Customer is the name of a model class would appear to return all customer instances. You need to yield an Entity object from Pyslet's DAL for each customer instance and populate the property values from the fields of the returned model instance.

Implementing __getitem__ is probably also very easy, especially when you are using simple keys. Something like Customer.objects.get(pk=1) and then a similar mapping to the above seems like it would work for implementing basic resource look up by key. Look at the in-memory collection class implementation for the details of how to check the filter and populate the field values, it's in pyslet/odata2/memds.py.

Probably the hardest part of defining an EntityCollection object is getting the constructor right. You'll want to pass through the Model class from Django so that you can make calls like the above:

def __init__(self,djangoModel,**kwArgs):
    self.djangoModel=djangoModel
    super(DjangoCollection,self).__init__(**kwArgs)

Step 3: Load the metadata from a file, then bind your EntityCollection class or classes to the EntitySets. Something like this might work:

import pyslet.odata2.metadata as edmx
doc=edmx.Document()
with open('DjangoAppMetadata.xml','rb') as f:
    doc.Read(f)
customers=doc.root.DataServices['DjangoAppSchema.DjangoDatabase.Customers']
# customers is an EntitySet instance
customers.Bind(DjangoCollection,djangoModel=Customer)

The Customer object here is your Django model object for Customers and the DjangoCollection object is the EntityCollection object you created in Step 2. Each time someone opens the customers entity set a new DjangoCollection object will be created and Customer will be passed as the djangoModel parameter.

Step 4: Test that the model is working by using the interpreter or a simple script to open the customers object (the EntitySet) and make queries with the Pyslet DAL API. If it works, you can wrap it with an OData server class and just hook the resulting wsgi object to your web server and you have hacked something together.

Post hack

You'll want to look at Pyslet's expression objects and figure out how to map these onto the query objects used by Django. Although OData provides a rich query syntax you don't need to support it all, just reject stuff you don't want to implement. Simple queries look like they'd map to things you can pass to the filter method in Django fairly easily. In fact, one of the problems with OData is that it is very general - almost SQL over the web - and your application's data layer is probably optimised for some queries and not others. Do you want to allow people to search your zillion-record table using a query that forces a full table scan? Probably not.

You'll also want to look at navigation properties which map fairly neatly to the relationship fields. The Django DAL and Pyslet's DAL are not miles apart here so you should be able to create NavigationCollection objects (equivalent to the class you created in Step 2 above) for these. At this point, the power of OData will begin to come alive for you.

Making it Django-like

I'm not an expert on what is and is not Django like but I did notice that there is a Feed concept for exposing RSS in Django. If the post hack process has left you with a useful implementation then some sort of OData equivalent object might be a useful addition. Given that Django tends to do much of the heavy lifting you could think about providing an OData feed object. It probably isn't too hard to auto-generate the metadata from something like class attributes on such an object. Pyslet's OData server is a wsgi application so provided Django can route requests to it you'll probably end up with something that is fairly nicely integrated - even if it can't do that out of the box it should be trivial to provide a simple Django request handler that fakes a wsgi call.

Consuming OData

Normally you think of consuming OData as being easier than providing it but for Django you'd be tempted to consider exposing OData as a data source, perhaps as an auxiliary database containing some models that are externally stored. This would allow you to use the power of Django to create an application which mashed up data from OData sources as if that data were stored in a locally accessible database.

This appears to be a more ambitious project: Django non-rel appears to be a separate project and it isn't clear how easy it would be to intermingle data coming form an OData source with data coming from local databases. It is unlikely that you'd want to use OData for all data in your application. The alternative might be to try and write a Python DB API interface for Pyslet's DAL and then get Django treating it like a proper database. That would mean parsing SQL, which is nasty, but it might be the lesser of two evils.

Of course, there's nothing stopping you using Pyslet's builtin OData client class directly in your code to augment your custom views with data pulled from an external source. One of the features of Pyslet's OData client is that it treats the remote server like a data source, keeping persistent HTTP connections open, managing multi-threaded access and and pipelining requests to improve throughput. That should make it fairly easy to integrate into your Django application.

No comments:

Post a Comment