Open data manual

February 2, 2012

Whilst I’ve blogged more than 40 times on the subject of open data, I don’t believe I’ve covered the Open Data Manual. A hat-tip to jacques.raybaut at europa-eu-audience.typepad.com! The manual outlines what one should expect of open data, either presenting or using it.

Coincidentally, the UK Government published the summary of the feedback on its open data consultation on the 30 January 2011. The consultees include Socitm which was rather critical of the proposals. A key point that was made in the response was that “Socitm believes that open data issues need to be treated within a broader approach to information management and evidence-based decision-making”, unfortunately this general (and very important) point does not appear to be captured in the report.

So, we’ll see what comes next…


It works both ways!

September 6, 2011

With the central government pressure on  local government to be transparent and provide open data (thank you Mr Pickles), it’s about time someone considered it the whole way around. If local government is providing data for people to do wizzy things with, as in Fixmystreet or Fixmytransport, shouldn’t this be reciprocated, and those taking data provide the feedback or reciprocal data in a suitably open format rather than emails. Similarly, if we are providing all this open data at the government’s behest shouldn’t that be an end to form-filling? Rather than endless submissions to government departments every year, can’t they just suck in a CSV or XML file living on a website?

When I was writing my dissertation there was lots of ways of looking at e-government for example G2C or government to citizen, C2G being the reverse. There was also G2G, in which case if we push out data for central government and they suck it in, we can then take back their data feeds in electronic form. Such a solution would prevent a lot of double keying across the country, once the schemas are agreed and developers have developed a single or very few sets of interfaces.

Lets see an end to open data as a one-way street and build up to at least it being bi-directional, if not a motorway!


Open, and better, data

August 7, 2011

Open data is frequently promoted as a ‘good thing’, rather in the sense of the Sellar & Yeatman classic “1066 and All That“, where something is either a ‘good thing’ or a ‘bad thing’. As is explained in “Open data is not enough” by Raka Banerjee from the World Bank in his July 2011 blog, open data that is inaccurate and biased is a ‘bad thing’ and rather than being of not much use, such data can actually cause harm when used by statisticians and researchers to inform policy.

Scientists are normally quite clear about data quality but when open data is becoming part of a demand culture, unless those supplying it are aware of and sensitive to the outcomes that may result by its use, the citizens are in more danger from the production of the data than from its absence. About a year I posted upon the topic of “Council Web Costs“, following a newspaper report employing Freedom of Information data, where the person requesting it had limited knowledge of either web development or local government. The resulting figures were unhelpful to say the least.

Imagine a similar context where health policy was being decided based upon data that had been extracted similarly, not only would money be wasted investing in the wrong places, but underinvestment might take place where support was urgently needed. Open data is only a ‘good thing’ when we are assured that the data is good, and that is the job of both the requestor and the supplier.


Open strategy

December 3, 2009

So, the Conservative Party have leaked a leaked copy of the draft Government IT Strategy! I’d been privvy to an early draft through the Local CIO Council and hadn’t really thought anything was worth shouting about. In fact I’m not really sure that another government would do any much different, apart from branding and terminology. Whilst I am a strong believer in, and my dissertation relates to, “co-production”, I’m not a believer in crowdsourcing per se, it’s a bit like mob rule or, even worse, minority rule or oligarchy, which is apparently the Conservative Party rationale for leakage. I had wondered if it had been a deliberate leak on John Suffolk’s part but I gather from the Cabinet Office that this was not the case, however they do insist it was an early draft and that the feedback will be very useful!

This is the leakage is latest Conservative version of a Conservative ICT non-strategy and not some little way from their earlier rallying cries around “open source”,  “open gov” and “open data”. On the W3C group on e-government, someone recently posted a list of alternative “definitions” for such data and here they are with credit to Winchel “Todd” Vincent III of
<xmlLegal> http://www.xmllegal.org/ This may be developed as part of the groups work but the original is his.

“Unavailable: You simply cannot get the data.  Data is cost prohibitive to publish. There may be security or privacy reasons not to publish.  Or, simply, no one ever thought to publish the data.

Not Translated: Data is available, but exists in a different language than the end user’s language.

Paper: Data is available, but it is only available on paper.

Free: Data is available at no cost and without restrictions.

Fee Based: Data is available, but only for a fee.
— Public: Fee Based: Government provides data for a fee.
— Private: Fee Based: Private company provides data for a fee.

Copyright: Data is available (in some way) but there are copyright restrictions on republication or reuse.

Copyright with License: Data is available (in some way), there is a copyright, but also a license that allows some use (other than all rights reserved).

Public Domain: Data is available (in some way) and is in the public domain, so there are no restrictions on use of the data.

Electronic: Data is available electronically.

Electronic: Web Browser or Paper-Like Electronic Document Format: Data is available but only via a web browser or an electronic document format and not in an easily parsed format (where Images/Graphics, HTML, XHTML, PDF, Word, and Word Perfect do not count as easily parsed formats).

Electronic: Structured Format: Data is available electronically and in a structured format.  A structured format would include delimited text, spreadsheet, XML, and the like.

Electronic: Structured Format: Schema: Data is available electronically and in a structured format.  Additionally, there is a schema available that defines the structured format.

— Government Schema: A government promulgates the schema. The schema may or may not be in the public domain.

— Standards Body Schema: A recognized standards body promulgates the schema.  Schema is licensed under a “copyleft” (perpetual, free, but with restrictions not to modify) or similar license (typical of W3C, OASIS, but not all “recognized” standards bodies).

— Private Schema:  A private company promulgates the schema.  The schema may or may not have licensing restrictions associated with it.

Electronic: Browser/Viewer: Electronic data, whether structured or not, is available only via a web browser or other viewer for viewing.

Electronic: Download: Electronic data, whether structured or not, is available to download.  Here, download means a “manual” download. Some manual user input must be done to download the data (e.g., downloading a spreadsheet or structured text file via an HTTP link or FTP) to the user’s local machine.

Electronic: Web Service: Electronic data, typically structured, is available via a web service (meant in a generic way, not specific to a technology) for machine consumption.  There is some standard, specification, or documented publication rules, such that machines can reliably access the data on an ongoing basis.  The point here is not the format of the data, but the reliability and availability of the connection to the data, so that machines can get to the data feed without human intervention.

Each of these qualities makes the data more or less “open” or “accessible” as a practical matter.  There are  many combinations of these that one could put together.”

If anybody in UK wants to remember the recent history of the National Land and Property Gazetteer (NLPG), they’ll remember the local property data expensively gathered with great efforts spent cleansing it. The authorities who have spent large sums of money are now likely to find this being given away. There is current effort on matching this data with that from the Electoral Register, this is the Coordinated Online Register of Electors (CORE) project. One of the issues around propert data in recent times has been resistance from the Royal Mail which produces the postcode file to allow any fee-free use of the PAF. So local authorities are expected to give away data that has been expensively cleansed in order that private organizations may profit – if that is the Conservative plan – to see it given it away like North Sea Oil, public transport, British Gas etc etc. The comment is that public money paid for it, so the public should have it – but what if they have to pay twice?