Debugging unsortable problems in Python

Working in Python 2.6.1 on my Mac I noticed the following behaviour recently while debugging the QTI migration code:

>>> 'z'<('a','b')
>>> ('a','b')<u'a'
>>> u'a'<'z'

These three comparisons, between a string, a tuple and a unicode string demonstrate that it is easily possible to create an unsortable list of objects out of basic immutable objects such as might be used as keys in a dictionary.

This might look a bit esoteric but I'm only writing this blog post because I caught a bug which was caused by the incorrect assumption that lists of strings, tuples and unicode strings sort predictably.  I was representing XML attribute names using tuples if an attribute had a defined namespace.  The names were then used as keys into a dictionary.  Note that both 'a' and u'a' can be used interchangeably in Python 2.6 when looking up an entry in a dictionary so it was easy to go one step further and grab the list of keys, sort them and assume that the result would be predictable.  Not so.

The order of the keys returned by the key() method of a dictionary is not defined and the sort method will return different results depending on the initial order of the resulting list.

It took me a while to find someone else struggling with a similar problem but I took great solace in Incomparable Abominations.   This blog post deals with changes from Python version 1 to version 2.

I believe that Python 3 is doing two things to address the problem I'm having.  Firstly, the sloppy lack of distinction between strings and unicode strings is being cleaned up.  The transition will be painful (and mean more work getting the QTI migration tool working on Python 3-based systems) but it will prevent the type of comparison loop above.  Comparisons are also being tightened to prevent different types comparing unpredictably, a (unicode) string and a tuple will not be comparable in future meaning I catch bugs like this one earlier.

So a better future awaits, but why do the comparisons give the results they do in Python 2?  The answer is almost poetic.  Objects of different types usually sort by their class name, the comparison of a string and a unicode string is the exception because, provided the string is 7-bit clean, it is assumed to be ascii and compared as a string of characters.  We can reveal the class names using the interpreter:

>>> 'z'.__class__.__name__
>>> ('a','b').__class__.__name__
>>> u'a'.__class__.__name__ 'unicode'

As you can see, the type names start with the alphabetic sequence 's','t','u'.


Semantic Markup in HTML

A few days ago I spotted an interesting link from the BBC about the use of semantic markup.

This page got me thinking again about something I blogged about on my Questionmark blog too.  One of the problems we experienced during the development of QTI was the issue of 'presentation'.  In QTI, the presentation element refers to the structured material and interaction points that make up the question.  However, to many people the word 'presentation' means the form of markup used at the point the question is delivered.

I always found this confusion difficult.  Clearly people don't present the XML markup to candidates, so the real fear was the QTI would allow Question authors to specify things that should be left open until the method of presentation is known by the delivery system.

For some people, using a language like HTML implies that you have crossed this line.  But the BBC page on using HTML to hold semantic markup is heartening to me because I think that QTI would be better bound directly into HTML too.

HTML has been part of QTI since the early days (when you had to choose between HTML and RTF for marking up material).  With QTI version 2 we made the integration much closer.  However, XHTML was in its infancy and work to make the XHTML schema more flexible through use of XML Schema and modularisation of the data model was only just getting going.  As a result, QTI contains a clumsy profile of XHTML transformed into the QTI namespace itself.

In fact, XHTML and XML Schema have not played so well together and HTML5 takes the format in a new technical direction as far as the binding is concerned.  For QTI, this may become a block to the rapid building of innovative applications that are also standards compliant.

But bindings are much less important than information.  I always thought that QTI xml would be transformed directly into HTML for presentation by server-side scripts or, if required, by pre-processing with XSLT to make HTML-based packages.  That hasn't really happened, so I thought it might be harder than I thought.

However, I did a little research and have had no difficulty transforming the simple QTI XML examples from QTI version 2 into XHTML 5 and back again using data-* attributes to augment the basic HTML markup.  I'll post the transform I used if there is interest.  Please add a comment/reply to this post.