An interesting piece appeared on the Guardian data blog on Friday. It describes a wealth of new data being released relating to court and conviction information.

The database shows sentencing in 322 magistrates and crown courts in England and Wales. Defendants’ names are excluded but details such as age, ethnicity, type of offence and sentence are not. Any computer user can analyse aspects such as how many white people were sent to jail for driving offences.

All good stuff. There’s definitely value to be gained from this type of analysis. It’s being released as a database (hopefully with a commitment to regular ongoing publication), and it brings consistency to often haphazard arrangements for making data available. These are positive moves, and should be welcomed.


Transparency campaigner William Perrin, who advises the Ministry of Justice on opening up its data, says the release is a big step: “Publishing the details of each sentence handed down in each court is a great leap forward for transparency in the UK, for which MoJ should be warmly praised. Courts have to be accountable to the local populations they serve.” But he, like some campaigners, believes the MoJ should go further, releasing the names of defendants. “The data published is anonymised, flying in the face of hundreds of years of tradition of open courts and public justice.

“The MoJ need to have an open and public debate about the conflict between the central role in our society of open public courts where you can hear the name and details of offenders read out in public and crude misapplication of data protection.”

My concern lies with the consequences of releasing the names of individuals, as proposed here, in a completely accessible and reusable way.

William draws a parallel between the act of reading out names in public court and publishing them on the Internet. (Disclosure: William and I both sit on the Transparency Sector Panel in MoJ.)

Were it a simple parallel, with the same consequences, I’d be pretty comfortable with the principle of release, too. But I see one very big difference: raw content on the Internet is (almost always) indexed by search engines. And search engines have very, very long memories. The (only) two things that the Internet has fundamentally changed are the ease with which information can be found, and the duration and extent over which it persists–as I’ve banged on about on this blog before.

So, this proposal (if taken at face value) would lead to a couple of consequences which might not be wholly desirable: firstly, a name would quite feasibly, if entered into a search engine, throw up information about an offence and the consequent sentencing for an indefinite time. What implications does that have for rehabilitation of offenders? If your conviction has been spent, and your potential employer does a quick check and finds that the only thing you’ve ever been noted for on the Internet is… Well, would that feel just to you?

Ah, I hear you say–but look at court reporting now: those journalists that do manage to get intelligible information out of a clerk so they can write their pieces accurately end up with their content being indexed (paywalls permitting), and the Google ghosts will be there to do their haunting anyway. Yes. They will. But this is an issue of scale and ease, not principle. Journalists today, even those with perfect information, exercise some choice over what they choose to print. Maybe this is just because of space constraints, maybe there are other factors at play. But the “release everything for reuse” stance would dramatically increase this scale of publication.

You may say that this is a good thing: along similar lines as “nothing to hide, nothing to fear”, this extra hangover from a criminal’s downfall may be a very positive thing for society. Another deterrent to criminality, maybe? I don’t know about that, but I do know that we then face a reappraisal about what we mean by rehabilitation as a direct consequence of data release.

And, as William says, that needs proper public debate.

But it’s not just a matter of scale. We find, when public data is released en masse, that new business opportunities spring up. Imagine the entrepreneur who gathers all data on convictions and charges for their own employee check service. They might adhere to principles of time limitation on their data. They might not. They might mash-up this data set with other information. They might not. They might put profit before principle.

We attempt to control such reuse of information with regulation, but on the Internet, it gets very much harder to make this stick in practice. Again, we risk changing the landscape of what it means to be convicted, by releasing data like this.

I’m fascinated by how even something like the current Data Protection Act relates to the indexing of personal information within search engines. Surely, almost by definition, the end purpose of such indexing cannot be known, and therefore Principle 2 (Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes–source: ICO) must surely be creaking already?

So, I’m not so keen on making it indexable. Can this be avoided? Is there a middle ground which acknowledges the shambles that is the current practice in courts–with some prepared to supply information in machine-readable format, others insisting on hand-written notes being passed, and some seemingly actively obstructive in providing information?

I think there might be. There are some “government” datasets which although they could be released for reuse, aren’t. For fairly good reasons. The database of car registrations, for example. I suspect we’d consider if a bad thing if a road rage incident could be easily followed up with some bricks through windows on the basis of typing in the offending registration plate when you got home.

Similarly, we have a curious set of “frictions” in place to allow us to have an electoral roll which is at the same time both “publicly viewable” (provided you go to a library) and searchable online only if you pay up a good chunk of cash. A big hmmm from me to that latter part, by the way, but you can read much more on electoral roll issues here.

And the way that this data is structured is also important: so that we can’t, for example, easily go online, type in an address down the road, get a full list of occupants’ names and pop round there with all sorts of social engineering stories designed to make trouble/extract money/dig for further info/groom/be very creepy. Again, I’d suggest we do this for good reasons, and we know how to build machinery to keep this equilibrium in our society.

We may solve the problem through choosing carefully the format for release, the means by which it’s referenced, and even to whom it’s released. Yes, I know, those wretched privileged accessors again (just like the Police, DVLA, local authorities, credit agencies etc etc etc.) Always a subject to warm the temperature in open data discussions!

But I’m not arguing for wilful obfuscation of this data, merely putting forward some of the alternative perspectives to “everything, raw, now”. We do need this public debate, and we need to be reasonably confident that we’re getting a net societal benefit from whatever action we take.

Let’s tread carefully here–just because you can, doesn’t always mean you should.

  1. I am wholly, absolutely and completely against the release of non-anonymised conviction data in a frictionless environment. The status quo is, for the most part, that you have to know who the person is and that they’re going to be in court before you can get at that information (journalism tempers that, but in most cases not to a significant degree). This works well, and doesn’t usually run counter to the law regarding rehabilitation of offenders.

    On the ”deterrent” angle: I don’t buy it. From what I gather, a good proportion of offenders don’t believe they’ll get caught (which is part of why deterrents only work to a certain extent, even in places where the “ultimate deterrent” is employed), and a chunk didn’t applying complete rational critical reasoning when committing the offence. There’s probably a fair degree of overlap here, too.

    There’s another angle, too: headline offences are a broad brush, and the “seriousness” (to most people’s eyes, including an employer’s) depends upon a whole swathe of factors. There is perhaps a tendency when discussing criminal acts to assume that all crimes are treated equal — you’re a criminal, you deserve what you get — but in reality (and as reflected by the fact that sentencing varies massively), all crimes are not equal in the slightest.

    Take “assault” for example: did you go around to somebody’s house and give them a kicking, or did you foolishly knock some guy out in retaliation after he thew a punch at you as you left a city-centre pub with extended drinking hours?

  2. I wrote a response to this on my blog because it’s far too long to go here, but in short, thank you for writing this, I had missed this initially, and I am recoiling in horror at the very idea. Talk about tarring someone with a brush for life.

  3. Remember that anyone processing offender data down the line would need to bear in mind the provisions of the 1974 Rehabilitation of Offenders Act, which treats convictions as “spent” after set periods depending on the severity of sentence passed, severely restricting the publication of details. (Though there is a public interest defence in the event of a “spent” offender suing for libel.)

    Obviously a profit-before-principle outfit selling old conviction data could operate in another jurisdiction – but surely most potential customers would be within reach of English law?

  4. Very important piece – and it’s important to understand that even well ‘anonymised’ data is almost certainly ‘de-anonymisable’. Studies by computer scientists over the last decade or so have demonstrated this convincingly – and THIS data sounds likely only to be anonymised on a very cursory level.

    This kind of data should never, in my opinion, be released. If the data is useful, the aggregation and analysis needs to be done before the data is released onto the Internet, and only the aggregations and analysis released. Even then, care needs to be taken because ‘de-aggregation’ may be possible.

    Amongst other things, this highlights the need for care – and to consider how there might really be a need for a ‘right to be forgotten’, and that reactions like that of the ICO that it just might all be too difficult should be avoided.

  5. It makes no difference if your conviction is spent or not these days as every employer seems to want a CRB check. As these list everything in your past, even if you make one mistake, prospective employers will still see it years down the lineand hold it against you.

  6. The idea that a conviction is spent is pretty much untrue anyway – the internet only highlights the problem.

    The way I work around this is to be totally upfront and honest about mine. And then let people see what I do, how I behave and what opinions I hold.

    Those who have a problem will always have a problem. And those whose minds can be changed will change them, in my experience.

    Having a criminal conviction is always a life sentence in practice when it comes to being accepted back into society – it just depends on what kind and who makes up that society.

  7. thanks Paul for getting a debate going – a few of things to bear in mind.

    When found guilty and sentenced in court (the list published is sentences handed down) people surrender some freedoms – to spend money or time in the way they choose. Implicit in that for hundreds of years has also been a loss of privacy in an open court system. Application of data protection without a debate alters that balance.

    Remember the victim perspective – part of their sense of retribution may well be the perpetrator being labelled as a criminal (i’ve been a victim of crime many times and that’s my perspective).

    The British system is very open – i ran the basic issues about publishing data on the internet past a reformist prison governor and a policy maker from a prison reform charity. They were both bewildered that names shouldn’t be on the internet in our open system.

    The riots and large scale publishing of names of offenders in the national press illustrated just how broken the old system, dating back to the 1974 rehabilitation of offenders act is. More modern anti-discrimination legislation tends to puts rights in the hands of those discriminated against, rather than trying to mask at the outset the potential of discrimination.

    In a era of citizen publishing, ‘journalistic’ exemptions to FOI seem archaic

    finally – look at the picture in the round – a community should know who has been found guilty of crimes committed in the area. In an internet era the moment you open public galleries its a very rapid slope to just publishing stuff on the web (journalists can now tweet from the gallery)

  8. If you’re a celebrity, of course, the information already gets logged in a publicly accessible database. Coincidentally, and prompted by an entirely different Guardian article, I blogged about this myself last week. Not sure if the comments here accept links, but if so here here’s the post:


    In the long run, I think that opposition to publishing the data will simply crumble. At the moment, anyone can publish anything they know (other than material subject to reporting restrictions), and the majority of court verdicts do get published in the local press. The argument in favour of courts putting their own decisions on the web isn’t about exposing criminals, or adding to the deterrent effect of a conviction, it’s simply about allowing an already-accepted principle of open justice to be exercised via contemporary technology.

  9. William: you seem to be advocating that any sentence for a conviction, should and does carry with it—in effect—an absolute life sentence of being branded a criminal. Is that REALLY what you’re arguing?

    While it is true that modern anti-discrimination tends towards redress than outright prevention, I don’t think anybody is particularly convinced that it‘s been especially successful in achieving its aims.

  10. I understand the points being made, but I agree with William Perrin (above) that part of being a convicted criminal is giving up some of your rights (in this case, to privacy).

    As a thought experiment: if I were hugely rich, I could employ a number of people to sit in court, write down the (intentionally-public) individual sentencing data (probably in far more detail than the government would ever release, and publish it online. Were I a newspaper baron I’d not just have them publish the boring data, but also editorialise it. This would surely be legal – it used to happen back when we had a proper local press industry, after all.

    It feels very odd that we’re saying that it would be inappropriate for the government to release information they hold when anyone sufficiently-motivated and well-resourced could create themselves. It doesn’t feel very compatible with the concept of openness and transparency.

  11. A conviction for a crime is a fact. You can’t change that fact. So yes, it will stay with you for the rest of your life, and beyond.

    The rehabilitation of offenders act isn’t about pretending that people haven’t been convicted when they have. It’s about ensuring that old convictions are not used to discriminate in areas where they are not relevant. Trying to eliminate discrimination against reformed criminals by pretending that they were never convicted in the first place is a bit like trying to eliminate racial discrimination by pretending that black people are actually white.

  12. Mark, whilst I agree with what you say about a conviction being a fact, and hence not changeable, there’s a difference between knowing and accepting that and actually facilitating the potentially discriminatory use of that information. For me, that’s the danger here – making the data available in a simple, raw and easy to use format verges on asking for the information to be misused, particularly if the availability of that data is trumpeted as a triumph for ‘open data’. Simple, practical barriers can make misuse of data less widespread.

  13. Part of the issue is that data dissemination hasn’t kept up with social mobility. Not all that long ago, in generational terms, if Mr Reformed-Thief applied for a job at Associated Widgets then the recruitment officer knew he had a criminal record because Mr Reformed-Thief had lived in the area all his life and it had been in the papers when he was convicted. But a combination of increased social mobility and diminishing reach of the local press means that, in many cases, such information is simply obscured. It isn’t private, it isn’t confidential, it’s just not anywhere that many people can access it.

    I think Jon C’s point is a really good one. There may be cases where certain data has restricted public access. For example, information from the DVLA record of registered keepers is accessible if you can show good cause, but otherwise it’s inaccessible. In other cases, though, there is no legal restriction on access, and anyone can obtain the information for any purpose whatsoever – access is “applicant blind”, to use the phrase related to Freedom of Information. And I agree with Jon that access to such data should not be price-rationed or effort-rationed (which is essentially a proxy for price anyway, since the rich can afford to pay for others to make the effort).

    And the key point here is that we have already decided that court records fall into the latter category. That has been the case for centuries. It is a fundamental aspect of the British legal system. Other than in certain, limited instances where reporting restrictions are imposed, anyone, anywhere has the right to know what went on in any court in the land. Traditionally, the exercise of that right has been via the press, who act as the proxy for those unable to attend court in person. In the past, that had to do, because it was all we had. But that isn’t all we have now. Now, we can do better than the press used to be able to do. And there’s no reason not to do it.

  14. Doesn’t this whole debate just cloud the real issue? That employers are inappropriately judging people on the basis of their spent convictions?

    It seems to me that some or many employers will find out about convictions (spent or otherwise) as part of vetting potential employees, no matter whether this data is anonymised. While I completely agree that artificial friction can be a useful mechanism, it seems a little Canutian in this case.

    Wouldn’t it be better to establish more firmly that spent really does mean spent, and punish employers who discriminate inappropriately, the same way we do for gender/race/disability/etc? In which case, the major problem caused by the frictionless availability of this data goes away? And which might over time serve to change people’s attitudes towards ex-offenders?

  15. Can’t we do both? I agree about the establishment of firm principles and rules, and perhaps laws, concerning the use of data (in Germany, for example, you’re not allowed to access Facebook data about applicants during the recruitment process) and at the same time keep a little friction in the process? I’m not suggesting artificial friction – though there are arguments for that too – but just not removing all the friction that currently exists, let alone adding impetus and even kudos to the use of this data.

  16. I must admit that what worries me are the underlying unclarities about the function of the criminal justice system. William wrote:

    “Courts have to be accountable to the local populations they serve”

    That isn’t right, surely. Courts are instruments of the state. Someone who has broken the law has committed an offence against the state. The criminal justice system is how the state attempts to force compliance with its laws.

    Many criminal offences have victims, and one god reason for criminalising those acts is to protect citizens from the harms that come from them. (But notice that many acts which cause harm to others are not criminalised, and for good reasons.) Consequently offences often have differential sentences depending upon how much harm they cause. But that does not entail that even one purpose of the system is to enable the victim to take lawful ‘revenge’ on the perpetrator of the crime, merely that to protect the citizen from severe harm, such crimes have to be more strongly discouraged.

    It is, of course, true that psychologically it often helps a victim of crime to come to terms with what has happened to her or him if they know the perpetrator has been punished. Our brains may be hard-wired to like retribution, but that neither makes it morally justifiable nor the appropriate basis of a criminal justice system.

    Even if the moral justification for punishment was retribution (which is not very plausible), it would be retribution *by the state*. That is part of what the rule of law means.

    Let me give an extension of the principle of publishing names which hopefully shows how cautious we should be waving principles of justice around. Think of the class of crimes in which there would be no conviction without at least a witness statement from the victim. That is a pretty large class. We can then re-describe the victim as also being an ‘accuser’ in those cases. With a few notable exceptions, principles of justice require the name of that accuser to be read out in court as well as the name of the accused. So why not include that name in the database too? I take it we would all be uncomfortable with that.

    The reason we need the anonymised data is to check that justice is blind, that the criminal justice system is performing its function. That does not extend to publishing the names of individuals (though, incidentally, it would perhaps justify publishing an anonymous unique id, so we could track patterns of re-offending against types of punishment etc.).

  17. There are a whole multitude of uses for research purposes that this data set provides – socially, geographically etc. None of those uses though, that I can think of would have any need to know the identity of a defendant.

    As a Genealogist, I can see the interest on an historic basis and the Oldbaileyonline site, for example, provides for this interest in a small way, but it is academic ‘nosiness’ all the same. It serves no other purpose than to feed curiosity, with any positive ‘hit’ going some way to ‘describing the whole character’ of the ancestor concerned, regardless of the nature of the crime.

    God forbid that anyone be ‘identified’ by their crime for the rest of their life. And just as I begin to think of exceptions to that statement, I realise that it is my personal opinion, beliefs and priorites which form them, which is why criminal justice is ‘done’ by the State and we must accept it’s decisions or seek legal redress, not take the law into our own hands.

    To give access to defendants names in this data would be pandering to social nosiness and make everyone who has ever committed a crime, no matter what it may be, potentially the victim of a whole variety of social agenda.

  18. We need to stop talking about “data” being “available on the internet” as if this itself were a process with a significant amount of friction. Today the internet is in your pocket, not back home or at the office where your “computer” is. Tomorrow the internet will be in your glasses. The day after tomorrow the internet will be in your neutral implant.

    We’re moving towards total informational awareness. If you want to switch on the “criminal convictions” layer on your glasses you’ll be able to see the records of the people in front of you in the street, in the pub, at the school gate, in the workplace. Perhaps this layer will be on by default. Face recognition mashed up with criminal records data in real time. Offenders might as well be walking around with their criminal records on sandwich boards.

    This is beyond the “life sentence” of shame or notoriety that some have referred to above. Yes, even now there’s a possibility that someone could Google your name and unearth your past if it’s been reported in a local newspaper or on a blog. But that’s a different thing from being able to instantly and effortlessly know that your new boss has a fraud conviction from 15 years ago and that the parishioner in the next pew has a string of drink driving convictions.

    No-one can deny, as others have mentioned above, that a criminal conviction in a court is a historical fact. Yet the principles behind open justice seem little to do with lifelong notoriety for offenders. As a society we want to ensure that justice is done at the time of the hearing. This is important for victims and those directly concerned with cases to know who committed the offences and how they were punished. But it’s also important for defendants and the wider community to ensure that the trial and sentencing process is transparent and fair. Defendants are named because we don’t want people disappearing into the prison system without public accountability. Sentencing is public because we want to ensure that sentences are lawful, consistent and justified. Open justice protects offenders and innocent defendants alike from judicial abuse. It protects all of us. Yet none of these purposes require individual offenders to be identifiable in perpetuity.

    Concerns about making criminal records data effectively ubiquitous go beyond its likely impact on the rehabilitation of offenders, though as others have mentioned, that requires careful consideration of its own. We also need to consider the impact on public order. We should be wary of the economic impact of leaving large numbers of people effectively unemployable, not just because employers might discriminate during the hiring process but because they would know that any employee with a criminal record in a public-facing role would be immediately exposed to the organisation’s customers and clients. Most importantly, we need to remember the social and psychological value of trust extended to strangers as an implicit courtesy rather than as an informed and calculated favour. It’s not so much whether we should be able to escape our pasts as whether we should be able to escape other people’s.

  19. Two more open (quasi-)judicial processes which generate “historical facts” but where it might not be helpful to be able to perpetually identify the people involved or their relatives:

    – patients “sectioned” to hospital under the Mental Health Act
    – verdicts and the details of cases in Coroners’ Courts.

    In both cases the public interest is clearly served by having open processes at the time of the hearings but should you really be able to find out which of your Facebook friends have been sectioned and which ones have lost close relatives in tragic circumstances?

