How Open Data Principles challenges INSPIRE, data streams are the new gold and the future of data catalogues when search engines implement dataset search.

Mid september the Netherlands and Flanders organised this years edition of the INSPIRE conference in Antwerp Belgium. GeoCat participated as a presenter and sponsor. This blog shares some thoughts related to topics discussed at the conference.

How Open Data Principles challenge INSPIRE.
Back in 2007 at the design of INSPIRE, the main goal of the program was to improve cooperation between government layers. These days some consider the highest value of INSPIRE are its open data and transparent government aspects. One can doubt if the current regulation fits with that shifted perspective. The current regulation requires organisations to harmonise their data models to European models. In practice many harmonise their data at the end of the pipeline, to satisfy the regulations. In my talk on wednesday I challenged this approach. In my opinion  harmonisation should be effectuated prior to data acquisition. Because we frequently notice loss of data quality and context in the harmonisatation process. If implemented at the end of the pipeline the approach challenges the first two principles of open government data; data should be primary and complete.

Unfortunately a lot of software used in public administration is not able to operate on app-schema GML as used in the context of INSPIRE. This limitation requires a form of transformation between the operational level and the data publication. To still facilitate early harmonisation the harmonisation concept should be split in semantic- and technical harmonisation.

  • Semantic harmonisatation (maintain similar types and properties, use of common codelists, match data types) should be implemented as early in the data process as possible.
  • Technical harmonisation (transformation of tabular data to appschema GML) can be postponed till publication phase.

A success indicator on any harmonisation process is to verify if one can traverse the transformation process. Assess how you could import an INSPIRE data publication from a neighbouring state/region/municipality into your operational databases.Aspects to assess in such a process are:

  • Are data aspects missing or semantically misaligned
  • Do objects still have the same identification and are links between different registrations still operational

As an alternative to current practices I am in favour of having entities publish data as primary and complete as possible and publish alternatively the same dataset, but then harmonised according to the INSPIRE models and encodings. In my wednesday’s talk I presented an alternative approach in which organisations would publish their data as-is, combined with a set of rules so any data-user (or intermediate) would be able to set up a transformation process to convert the data to the INSPIRE models. Such a scenario would better fit with the open data principles, but keeps the added value of harmonised datasets European wide.

Data streams are the new gold
I was triggered by a keynote of Tom Tom’s Alain de Taeye. Apparently Tom Tom exchanges datasets with public entities in exchange for adding value to those datasets from their abundant databases. Besides the philosophical aspects of open data constraints that are introduced by this type of data trading, his story shows clearly the value of real time streaming government data. Alain’s presentation seemed a bit of topic within the scope of INSPIRE with its history of batched datasets, some first published up to a year after acquisition.

However… times seem to be changing. With the upcoming release of annex 3 we’ll see an increase of publications of (near) real time environment monitoring data. Which will probably also stimulate an increased update frequency of annex 1 datasets (to prevent misalignment). Good to know that INSPIRE is ready to accept streaming datasets, with it’s support for Sensor Observation Services and the upcoming SensorThings IoT API (thank you OGC and Kathi).

It has always stroke me that our national spatial catalogue hardly advertises any SOS and SensorThings datasets. I expect the level of standardisation and implementation in the sensor & real-time-streams domain within Dutch government is still low. Another aspect is that the sensor community may not consider to register their streams in a spatial catalogue, some are actually pushing for a separate sensor registry in the Netherlands.

The future of data catalogues when search engines implement dataset search
On wednesday JRC organised a special session around the recent release of Google dataset search. Quite interesting discussion around the future of data catalogues in case search engines would take this role. In the Geo for Web experiment organised by Geonovum, we’ve already proven that dataset search is a sensible use case for search engines, and it can be facilitated quite easily from our highly standardised domain by adopting some best practices. Very good to see dataset search materialised and finding back some published datasets. Personally I feel the search engines will not make catalogues disappear, to put it even more strongly; currently it’s the catalogues that facilitate the discovery by search engines. 

JRC indicated to start a test bed to further investigate alignment between INSPIRE and the search engines. A serious aspect for that group to consider is the adoption of WFS 3 as a download service. WFS 3 is build up from the ground around the Spatial data on the Web best practices, while keeping the core WFS concepts such as featuretypes and features. By following the best practices, WFS 3 will be easy to crawl by search engines. WFS 3 (as presented by Clemens) is moving fast. First initiatives (for the base standard) started early 2018 and currently in public review. In preparation are extensions to support data models. WFS 3 brings a new awareness to OGC. Rumours go that other standards may also follow the design principles. STAC is such an initiative within WFS 3, which could be a start of CSW 4. Other related news from OGC is the beta NamingAuthority. Introducing persistent identifiers for most of the OGC concepts.

Next year?
The INSPIRE conference apparently skips a year. The next edition will be organised in 2020 in Dubrovnik. Your alternative for 2019 could be the global FOSS4G conference, organised 26-31 of august in Bucharest. We’ll make sure there will be an INSPIRE track at that conference to continue our discussion.

SPATIAL DATA INFRASTRUCTURES SIMPLIFIED