There’s data, and there’s data

Enigma machine, by Paul Clarke - click through for bigger picture

I’m enjoying the latest flowerings of open data, and the recent quality posts from Ingrid Koehler and Steph Gray on what it all might mean. As well as quality action from Rewired State and others to actually demonstrate it in practice. (ooh, I just spotted that a reel of my photos is running on the Rewired State home page – thanks guys)

We’re getting a better understanding of what data actually is now that we’re seeing more of the things that were previously tucked away.

I’ll add my own observations: it helps me, at least, when thinking about complicated things to break them down a bit. My suggestion is to think in terms of four broad types:

1. Historical data

What’s happened in the past: how organisations and people have performed – what’s been said in meetings – what’s been spent – where the pollution has been – how children performed in tests…

2. Planning data

What’s projected to happen, or will shape what will happen: this and next year’s budget – legislation in progress – consultations – proposed housing developments – manifestos…

3. Infrastructural data

The building blocks of useful services. Boring stuff, doesn’t change that often, but when it does, it needs to be swiftly and accurately updated: postcodes – boundaries – base maps – contact directories – opening hours – organisation structures – “find my nearest…”

4. Operational data

The real-time stuff; what’s happening NOW: where’s my train/bus? – crime in progress – emergency information – school closures – traffic reports – happening in your area today…

These are not unrelated: what’s happened in the past will often guide what’s planned for the future. Today’s operational information becomes tomorrow’s history. And so on. There’s plenty of overlap. They’re intended as concepts, not hard definitions. The types can also be combined in every way conceivable: that’s part of the point of releasing the data in the first place.

I’m deliberately drawing no great distinction here between ‘information’ and ‘data’: the latter is a structured, interpretable incarnation of the former. That’s another set of issues in itself. I’ve also skipped over questions of interpretation and spin – this is a blog post, not a chapter of my book 😉 And I’ve omitted “personal data” as a type – this is woven through all areas and carries with it its own baggage. I’m thinking more about the basics of function and purpose. Which lead on to usefulness. Which, as I’ve said before, is the test that all this is taking us in the right direction.

“Useful to whom” does of course vary by type: 1 and 2 are great for those holding public service to account (press, public, whoever). 2 is for those who will make change happen. 3 will benefit of ordinary people in day-to-day life (and I’m careful here not to imply that these ordinary people ever have to see ‘data’ or an ‘e-service’ themselves: their local paper, toddler group, or community centre noticeboard are all valid intermediaries here). 4 will do things for the e-enabled – the mobile generation, the data natives, as well as for places that can serve an offline public (screens in train stations, visuals at bus-stops).

As a practical suggestion, I would love to see some of the current initiatives to build repositories and access to data recognising these distinctions exist. A little more signposting about the type of data that’s being released may help to highlight which types are being overlooked. For as we know, opening up the narrative helps to drive the change itself.

And how are we doing against these four types?

Pretty good on historical (it’s quite easy to dump old files online); weak on the future planning stuff (trickier, because if there’s no means of action accompanying the data, will publishing do anything other than frustrate?); getting there on infrastructural (though licensing, linking and standards offer the greatest challenges); struggling on operational (contractual, accuracy, standards).

That’s a one line summary. What do you think? Where should we putting more effort?

10 Comments

Tim Davies

June 4, 2010 / 12:48 pm Reply

Hey Paul

An insightful post. Where should we be putting more effort? Well – you’re spot on that signposting people to what sort of data a ‘repository’ is offering them is key – and working hard to improve the user-experience of data repositories, which I’m not sure is all that great right now…

The other big place to focus effort is, I think, on building up the capacity of different groups to be mediators of data. There are some great groups of mediators of data in the community around Rewired State etc, but often things are still a long way from getting to Toddlers groups and community noticeboards around the country.

One other focus area: making the data social, or at least, adding a social layer around the datasets. Right now a lot of people are downloading datasets, and puzzling through the same challenges working out how to make sense of them. Whilst with COINs we’re seeing some great collaboration going on (e.g. http://pad.okfn.org/coins) – if you don’t know where to look for those collaborative opportunities, or you’re not plugged into the right london-centric networks you can be left on your own working with data. Creating ways as well for data to act as a point of contact between civil servants generating it, and the people using it – would be great. Right now the data.gov.uk’s operate as a bit of a funnel – and connected back to data custodians is not easy…
Open data requires responsible reporting… : Tim's Blog

June 4, 2010 / 4:55 pm Reply

[…] should involve a clear link back to the dataset. If that was combined with some of the points Paul Clarke noted (and my comment on that post picks up on) around improving the user-friendly nature of data-stores, then simple steps might move us closer to […]
Peter Jordan

June 4, 2010 / 11:29 pm Reply

A very useful post, Paul; and though you don’t use the label, a cogent argument for metadata. Signposting things like type of data, scope, validity and provenance would help us understand what’s being released and help developers understand what data they are mashing —> linked data and all that.

Also liked your “‘data’: … is a structured, interpretable incarnation of [information]. Helpful when having discussions about opening up ‘content’
Principles for, and Practicalities of, Open Public Data « OUseful.Info, the blog…

June 30, 2010 / 10:33 am Reply

[…] As to what data might be opened up, particularly by local councils, Paul Clarke identifies several different classes of data (There’s data, and there’s data): […]
Trilobyte

July 7, 2010 / 7:53 am Reply

Don’t you mean there are data, and there are data?
- Anonymous
  
  July 7, 2010 / 10:39 am Reply
  
  No. The use of the word as a ‘dimensionless’ singular (like ‘information’) is now widespread. The BBC style guide confirms this. Also see http://dbennison.wordpress.com/2010/03/14/data-…
  
  In very specific settings, such as the the science/academic world, data is more likely to be plural – but there’s no clear-cut right and wrong here.
Mark Golledge

April 8, 2012 / 10:32 am Reply

A really interesting post Paul. You might want to have a look at the Yu and Robinson article on ‘The New Ambiguity of Open Government’. I think they are right in that we are seeing a blurring of what we mean by open government (for accountability purposes) and open data (for making people’s lives better e.g. more informed choices, easier way to access public services). The emphasis, particularly in Local Government seems to have initially been on the former (publish £500+, senior manager salaries) and I wonder how much of focusing on the former rather than the latter is impacting on the speed of the movement at a local level.

I think the challenge is that the same data could be used both for accountability purposes and for improved delivery. Take school data – I might be interested in looking at educational results to help me make an informed decision about where to send my children to school….but I might also use it to challenge service delivery.

An interesting debate….
Future metadata – mark

July 10, 2012 / 7:10 am Reply

[…] In Paul Clarke’s excellent post on four categories of data, he suggests the following breakdown of data: […]
mark heseltine

July 10, 2012 / 7:38 am Reply

I’ve blogged a bit in response to this, suggesting a different aspect to data taxonomy http://markheadroom.wordpress.com/2012/07/10/future-metadata/
Paul

July 10, 2012 / 10:15 am Reply

Thanks Mark – just to note, my second category is very much about things that haven’t happened yet – critically, those that can still be changed (especially if they’re opened up to scrutiny).

But I like the way you’ve extended the concept to data types that might not currently be anticipated; nice thinking.

10 Comments

Leave a ReplyCancel Reply