Ashes 2006 warm up matches: Round 1 – Pakistan

England haven’t lost a series since Sri Lanka in 2003 (almost exactly 2 years ago). Now they’re going to Lahore missing Strauss and Jones and with Giles in need of surgery. Suddenly we’ve got an inexperienced middle order, are unsure as to our best bowling attack and are one nil down with one to play in a difficult tour location playing a team that are playing well above their game (at least on recent form). Looks like that unbeaten run may be about to end.

This is made all the worse by the fact that our next stop on this Ashes warm up run is India, followed by Sri Lanka and Pakistan back in England. Suddenly it’s not looking so easy, particularly if key players keep being either injured or out of form.

Contrast that with the Australians. The introduction of Hussey (one of their finest batsmen and inexplicably ignored for the Ashes). The re-introduction of Symonds (Watson can say what he likes about being the Australian Flintoff, but we all know that Symonds has that honour if he can only stay off the sauce before matches).

Combine these very wise team changes with an uncompetitive series at home against the Windies and (bar the South Africans) a dolly run up to the Ashes and you get the strong feeling that the Aussies will be entering the next Ashes rebuilt and reinvigorated and probably without a loss to their name.

We however, if current form is anything to go by, may well have suffered some hard losses and are likely to have been reminded that we are only world beaters when the best 11 aren’t injured and are firing. It’s a pessimistic view and I hope it’s wrong, but unless Jimmy Anderson finds form (and actually gets to play a match or two) and Cook and others are tested (and found out) quickly there’s a horrible possibility that we’ll look more like the 2002/3 tourists than the 2005 victors.

BeautifulSoup

So then. The first of an ever increasing number of technical posts I fear… Having chucked in the world of Microsoft Office and mobile telephones for the more edifying (if rather brain melting) world of vim, python, zope and plone I am a) going to have less time to browse cack on the web and b) have considerably more of my brain consumed by technical issues, so it is a fair risk that offmessage will slowly turn into yet another python blog. Sorry about that.

Anyway, new job = new projects and the first was a real stinker. Taking a web site with some 16,000 pages all written in old school non-semantic, non CSS, non XHTML, tables-for-layout based HTML and migrating the content to sexy new fully CSS/XHTML compliant pages. Yeah. Nice. What a start. Particularly as I’ve never actually written python in anger before; a couple of CherryPy test sites and some scripts for stuff inside Plone, but nothing of any size or scale.

Luckily there is a thing called BeautifulSoup that is designed specifically to cope with poorly formed HTML and allow you to grab the content, regardless of the quality, as long as you can find some rule to traverse the mess. Very powerful, very robust and very clever (and with some very nice class names if that’s your cup of tea…)

Two gotchas I’ve found so far… Although a tag object has attributes in the form of dictionary keys they do not support the .has_key() method, so:

>>> from BeautifulSoup import BeautifulSoup
>>> foo = '<a href="thingy.html">linky linky</a>'
>>> bar = BeautifulSoup(foo)
>>> print bar.a['href']
thingy.html
>>> print bar.a.has_key('href')
Null

Took me a fair old while to work that one out, I can say… Patch for BeautifulSoup.py here if you so desire. It doesn’t handle the 'href' in bar.a scenario, but it does provide .has_key() and stop you resorting to KeyError exceptions or any other hacks

The second one is simpler to identify what’s going on, although I don’t have a patch for it (the work-arounds are too easy). In certain instances when it comes across & it will automatically convert the following text into an HTML special character (so in my example R&D turned into R&D;). I couldn’t work out exactly which cases caused this to happen, but it appears to be somewhere around assignment of a string to Tag.string.

One last thought…

…on the Sony rootkit saga. This time from Bruce Schneier over at Wired. He asks us to consider the real story: How come our AV software didn’t pick this up in the first place?

Who are the security companies really working for? It’s unlikely that this Sony rootkit is the only example of a media company using this technology. Which security company has engineers looking for the others who might be doing it? And what will they do if they find one? What will they do the next time some multinational company decides that owning your computers is a good idea?

We all know what happens to people who infringe copyright…

I’ll stop posting about the Sony thing soon, really I will. But this latest little episode throws the whole thing into yet sharper focus… first4internet (makers of the XCP software used by Sony) have infringed the LGPL by using DVD Jon’s code without correct attribution. So. There’s one kind of copyright for you, and another for us? Is that how it works?

Oh. And then of course there’s this one too….

Sony DRM uninstaller ‘worse than rootkit’