Integrated occurrence data

Making accessible all data about when and where any given named organism has been recorded

The GBIO provided the following summary:

Documenting the distribution of species in time and space may be viewed as the ‘weather observations’ of biodiversity: the fundamental underpinning for any accurate model of existing patterns and trends. All of the data sources outlined in focus area B (Data) will contribute species occurrence records in some form or another, but the key elements of every record need to be brought together in a more readily accessible and usable form to enable efficient discovery and use.

Many national, regional and thematic efforts such as the Ocean Biogeographic Information System (OBIS) and VertNet are already mobilizing significant quantities of data on specimens and biodiversity observations. Mature data standards such as Darwin Core and the Access to Biological Collection Data (ABCD) schema have enabled GBIF to index and organize data from thousands of such sources. The GBIF data portal already offers access to more than 400 million species occurrence records.

In the short term, national, regional and thematic networks that already handle occurrence data should work to link their data to global networks. Global activities such as GBIF should enhance their processes to improve understanding and fitness-for-use of all mobilized data, to provide support for data on species abundance and sampling events, and to ensure that all contributors receive appropriate acknowledgement and feedback for their work. In the medium term, additional sources of data on species occurrence, including published materials (B1), sequences and genomes (B4) and automated and remote-sensed observations (B5), should also be integrated. In the long term, globally-connected networks should continuously and automatically process all new observations and samples of biodiversity.

Since GBIC in 2012, multiple infrastructures have continued to develop as aggregators of specimen data and of all categories of evidence for species occurrence. GBIF now holds around one billion data records and is positioned to scale significantly further. OBIS and VertNet remain highly significant networks focused on high-quality data mobilisation in their respective areas. National initiatives such as the Atlas of Living Australia (now being replicated as a series of Living Atlases in other countries) and iDigBio in the US are organising species occurrence data at national or regional scale and augmenting these records with relevant national taxonomic or trait information and geospatial and environmental layers. On a smaller scale, project communities use tools such as Symbiota to aggregate data from multiple institutions to address particular research questions.

However, significant challenges remain in relation to the GBIO vision of Integrated Occurrence Data. No single aggregator offers access to all digitally available species occurrence records. Each performs its own set of tests and transformations on the raw data records and provides its own interfaces and services for users to access them. Data records received different record identifiers in different systems and there is no robust way for users to annotate data in a way that will propagate wherever the record is referenced. The proliferation of aggregators also makes it difficult for data publishers to measure subsequent use and impact of their data.

In recent months, these aggregators have started discussions on how to resolve all of these issues. Technical options exist to solve most problems. The challenges to be solved are primarily around coordinating governance and trust and avoiding changes which may stifle future innovation.

GBIC2 will include a working session to explore the challenges which may impede progress in these areas. The goal will not be to develop a multi-year roadmap and set of fully-refined priorities for Integrated Occurrence Data as a component of the biodiversity informatics landscape, although recommendations from the session will be taken forward for this purpose. The key goal in the context of GBIC2 is to understand the nature of the impediments which limit progress in this area towards seamless unified access and use of occurrence data and continuous stable management of these data as a global resource. Thus will enable the GBIC2 workshop to consider the best approach to address these impediments within the governance and the planning processes of an international coordination mechanism.