The deadline for adopting the EU General Data Protection Regulations  (GDPR) is approaching soon. In this article I’ll dive in the topic if the open source project GeoNetwork can facilitate the implementation of GDPR at organisations.

GDPR is the new EU data protection regulation that organisations have to comply with from mid 2018. GDPR tells organisations to evaluate for each dataset containing privacy sensitive information their policy for protection, retention and necessity.

GeoNetwork is a web application for registering datasets to facilitate discovery and assessment of relevance. Underlying goal is to increase the efficiency in society by facilitating data exchange. Concrete; preventing data redundancy and ascertain the most relevant data available is used.

By its nature a product like GeoNetwork is a key player in the privacy domain. A primary goal of a catalogue is to increase re-use of data outside the use cases for which the data was originally collected. And exactly this aspect is the worst nightmare in case the data contains privacy sensitive information. Key aspect of privacy is that a person has to agree to share his data for a single purpose only. Reusing that data for other purposes is forbidden (with some exceptions in the crime-fighting/national security domain).

Since products like GeoNetwork are in the heart of discovery and assessment, it is important that privacy aspects of datasets are published as part of the discovery and assessment information. So potential users are instantly informed of the usage limitations of datasets. GDPR actually requires organisations to set up a registry of privacy aspects related to datasets they are collecting. I hereby invite organisations to use GeoNetwork as such a registry. There are some important benefits:

  • Privacy information is managed at the same location as where other metadata of the dataset is collected.
  • Quite some of the required privacy information overlaps with existing metadata.
  • By adopting the standardised metadata exchange protocols (CSW/RSS/OAI) and metadata schema’s (iso19115/DCAT) privacy information can be easily shared with authorities and (if relevant) wider audience.

Which brings us to the question, how can GeoNetwork facilitate registering the information required by GDPR. There are two potential approaches. The first is to embed the privacy information inside the already available catalogue record containing general metadata of the dataset (title, abstract, author, date) using the commonly used iso19115 schema. Below table contains a potential mapping of GDPR fields to iso19115 fields. An alternative is to create a separate record having the GDPR information and link the record to the dataset record.

GDPR field iso19115 field
Responsible organisation organisation with role processor / custodian
Partner organisations that cooperate on the collection organisation with role custodian or user (to be discussed)
Organisations that use the data organisation with role user
The official responsible for data protection None of the available ISO roles match this role, I suggest to extend the ISO codelist with a role dataprotectionResponsible
Purpose of collection gmd:purpose
Type of persons on which data is collected gmd:environmentDescription and/or a keyword from a codelist having types of persons (to be discussed)
Type of privacy sensitive data which is collected A keyword from a codelist having types of priv. sens. data (to be discussed)
Retention policy (date of removal of the data) Date is usually relative to the date of collection for a subset of the data (to be discussed)
Type of users of the data A keyword from a codelist having types of users (to be discussed)
Data protection policy & measures securityConstraints/handlingDescription (to be discussed)
Indicate a resource is relevant for GDPR domainConsistency/result/conformanceResult/specification/title=GDPR (to be discussed)

Especially the first case, which embeds the GDPR fields inside the iso19115 record has a very low impact on existing catalogues and data practices. Data custodians are already used to creating metadata, they only need to fill out some extra privacy related fields. An approach to implement GDPR in GeoNetwork could look like this:

Similar to how INSPIRE implements the quality report DomainConsistency to indicate to which INSPIRE regulations a dataset or service conforms, a DomainConsistency report for GDPR can be introduced. In case this field is made available some GDPR related fields and validations can be auto-activated, so the custodian is aware of the fields that need to be filled before a dataset can be registered.

A relevant question is if (all) these fields should be accessible by the general public. If not the case, then mechanisms should be set up that prevent these fields to be exposed publicly. A usual setup is a dual setup of GeoNetwork, one in a protected (intranet) environment and one externally. An automated process (harvester) is set up duplicating content from the inside node to the outside node. The harvesting process includes en ‘anonymisation’ step in which sensitive fields are removed.

I’m looking forward to seeing implementations of such a setup in the near future and eager to hear your thoughts about this.

Paul.