Thursday, 25 November 2010

I LIKE TO EXCEL!

  
In a single transaction yesterday I downloaded 5,364,510 cells of data on what people do for a living and where they do it in the whole of England, Wales and Scotland.

That's a lot of cells. It breaks Microsoft Excel so you have to build it into an Access database, or preferably SQL, or whatever.

The official data agency that keeps this data for download and analysis couldn't cope with my demands (they place a 1m limit on the downloading of data) so I had to perform the request in 7 chunks. Why the limit? It doesn't take a brain surgeon to split the request and then re-build it at the other end, but it is mighty annoying to have to do so.

But now that I've got my hands on the data I can do amazing things with it when I blend it into my mapping software (there's the SQL bit) and output that to Crystal Reports!  Things that, to my knowledge, no other private sector economist in the country has yet done. Which is good. And keeps Polko a very busy boy.


The problem though is that this data is not good enough.

This is the problem with a lot of UK data to be honest. As one of the leading lights (?) of European economies we should have a great data collection, drilling down into all sorts of very specific areas and allowing a really deep analysis of what is happening at any one time.

The data I have is dated 2008.  The 'new' 2009 data is set to be released some time during December and I will again draw the whole of it into my web.  But, come on.  It shouldn't take 12 months after data is collected to collate it, check it, set it in the correct format for people to download, etc.  Not in the 21st Century.  I'd like to bet if the government were to outsource this data collection to a private company - and they are already out there - it would be 'on the shelves' and ready to use within half that time.  Maybe less.

In fact, in a conversation I had with the Office for National Statistics last week I was told 'there are issues' with the release data and they hinted at it being late..

The data I have details where people work, what type of business they work in, male or female and whether full or part time pretty well. But why can I not then get data that shows me how old they are? A simple addition but it might be useful?

Like I said, outsource this stuff

There are companies out there that colllect millions of credit card transactions and banking transactions on a daily basis and I can buy summaries for any area of how much people are spending, on what, where and their age and even type of house they live in.  This data is updated on a monthly basis and I can have it for September right now if I want it.

Think about it: This guy's salary is your tax.
 
A tired 12 month old dataset that a group of civil servants have been using for a pillow in a darkened basement room in Whitehall is just not good enough.
 

No comments: