Data matching

November 13, 2011

I’ve written about the inherent difficulties in identifying individuals or even individual properties from a practitioner perspective across multiple UK government computer systems before. Having been involved in the National Land & Property Gazetteer (NLPG) exercise from the outset I am aware that even with a standard for recognising, labelling and addressing static structures such as houses there are issues that can take a long time to settle. When we are considering trying to fix individuals, without the benefit of an identity card or similar compulsory marking system, this is going to be very hard – and the LLPG/NLPG saga has been going on for more than a decade and still isn’t perfect!

There is a vision within UK central government to move to a system of individual electoral registration. Currently one individual with a property is expected to take responsiblity for ensuring that all those eligible to vote within those premises are put on the Electoral Register, a very people-intensive process where forms are delivered to every known residence within each local authority area. These are then repeatedly chased for completion as a part of ensuring that the Register is up-to-date.

On 4 November 2011 the UK Parliamentary Political and Constitutional Reform Committee issued its Tenth Report on the topic of  Individual Electoral Registration and Electoral Administration. A number of conclusions are reported and amongst these were ‘Data matching can only be a success if local authorities are provided with the information they need in a timely and helpful way’. However, the general approach towards any sort of compulsion with regards to registering remains highly relaxed.

Whilst various legal requirements are in place for local authorities to hold address data, these still lack a level of consistency across the approaches, which all adds to the cost of managing computer systems and their interfaces. It had been hoped that the requirement for one LLPG would standardise this, however whilst legislation requires systems to hold addresses for Council Tax, Business Rates (NNDR), Elections, Environmental Health, Social Services etc etc these are all likely to be provided by different software companies, and whilst the Unique Property Reference Number may provide a link between them, once they are all matched, doing that work in the first place requires effort that cannot be afforded in these hard times. This all complicated by the base legislation where different individuals and different addresses have potentially different status within their respective laws.

This will be further confused by the divergent projects across government relying upon individual identity management with little apparent programme management to ensure they don’t do their own thing. The anti-ID card lobby have little to fear whilst personal identity applications will continue to breed and the £10 million promised by Francis Maude will not go far.


Open strategy

December 3, 2009

So, the Conservative Party have leaked a leaked copy of the draft Government IT Strategy! I’d been privvy to an early draft through the Local CIO Council and hadn’t really thought anything was worth shouting about. In fact I’m not really sure that another government would do any much different, apart from branding and terminology. Whilst I am a strong believer in, and my dissertation relates to, “co-production”, I’m not a believer in crowdsourcing per se, it’s a bit like mob rule or, even worse, minority rule or oligarchy, which is apparently the Conservative Party rationale for leakage. I had wondered if it had been a deliberate leak on John Suffolk’s part but I gather from the Cabinet Office that this was not the case, however they do insist it was an early draft and that the feedback will be very useful!

This is the leakage is latest Conservative version of a Conservative ICT non-strategy and not some little way from their earlier rallying cries around “open source”,  “open gov” and “open data”. On the W3C group on e-government, someone recently posted a list of alternative “definitions” for such data and here they are with credit to Winchel “Todd” Vincent III of
<xmlLegal> This may be developed as part of the groups work but the original is his.

“Unavailable: You simply cannot get the data.  Data is cost prohibitive to publish. There may be security or privacy reasons not to publish.  Or, simply, no one ever thought to publish the data.

Not Translated: Data is available, but exists in a different language than the end user’s language.

Paper: Data is available, but it is only available on paper.

Free: Data is available at no cost and without restrictions.

Fee Based: Data is available, but only for a fee.
— Public: Fee Based: Government provides data for a fee.
— Private: Fee Based: Private company provides data for a fee.

Copyright: Data is available (in some way) but there are copyright restrictions on republication or reuse.

Copyright with License: Data is available (in some way), there is a copyright, but also a license that allows some use (other than all rights reserved).

Public Domain: Data is available (in some way) and is in the public domain, so there are no restrictions on use of the data.

Electronic: Data is available electronically.

Electronic: Web Browser or Paper-Like Electronic Document Format: Data is available but only via a web browser or an electronic document format and not in an easily parsed format (where Images/Graphics, HTML, XHTML, PDF, Word, and Word Perfect do not count as easily parsed formats).

Electronic: Structured Format: Data is available electronically and in a structured format.  A structured format would include delimited text, spreadsheet, XML, and the like.

Electronic: Structured Format: Schema: Data is available electronically and in a structured format.  Additionally, there is a schema available that defines the structured format.

— Government Schema: A government promulgates the schema. The schema may or may not be in the public domain.

— Standards Body Schema: A recognized standards body promulgates the schema.  Schema is licensed under a “copyleft” (perpetual, free, but with restrictions not to modify) or similar license (typical of W3C, OASIS, but not all “recognized” standards bodies).

— Private Schema:  A private company promulgates the schema.  The schema may or may not have licensing restrictions associated with it.

Electronic: Browser/Viewer: Electronic data, whether structured or not, is available only via a web browser or other viewer for viewing.

Electronic: Download: Electronic data, whether structured or not, is available to download.  Here, download means a “manual” download. Some manual user input must be done to download the data (e.g., downloading a spreadsheet or structured text file via an HTTP link or FTP) to the user’s local machine.

Electronic: Web Service: Electronic data, typically structured, is available via a web service (meant in a generic way, not specific to a technology) for machine consumption.  There is some standard, specification, or documented publication rules, such that machines can reliably access the data on an ongoing basis.  The point here is not the format of the data, but the reliability and availability of the connection to the data, so that machines can get to the data feed without human intervention.

Each of these qualities makes the data more or less “open” or “accessible” as a practical matter.  There are  many combinations of these that one could put together.”

If anybody in UK wants to remember the recent history of the National Land and Property Gazetteer (NLPG), they’ll remember the local property data expensively gathered with great efforts spent cleansing it. The authorities who have spent large sums of money are now likely to find this being given away. There is current effort on matching this data with that from the Electoral Register, this is the Coordinated Online Register of Electors (CORE) project. One of the issues around propert data in recent times has been resistance from the Royal Mail which produces the postcode file to allow any fee-free use of the PAF. So local authorities are expected to give away data that has been expensively cleansed in order that private organizations may profit – if that is the Conservative plan – to see it given it away like North Sea Oil, public transport, British Gas etc etc. The comment is that public money paid for it, so the public should have it – but what if they have to pay twice?