Possible lessons from Apache, ELIXIR and GA4GH

Donald Hobern, GBIF Secretariat, 11 March 2019

An important step in establishing an alliance for biodiversity knowledge is to explore possible governance models, mechanisms for a collaborative community to develop shared implementation plans, and ways to facilitate and acknowledge contributions of institutions and individuals to delivering services and products.

During the GBIC2 workshop, attendees were presented with examples from two other communities, the Apache Software Foundation (ASF) and the European ELIXIR consortium. Attention was also drawn to the model of the Global Alliance for Genomics and Health (GA4GH).

There are significant differences between these examples. ASF is focused on supporting collaborative open source software projects, while ELIXIR works to increase synergy between European genomics facilities and GA4GH seeks to support much broader interactions and collaborations around practices, standards, tools and services for the genomics and health communities. Despite these differences, several aspects of each of these three initiatives merit consideration in the context of the new alliance.

Models for alliance membership

ASF has a membership comprising individual software professionals rather than institutions. The focus is on more than 350 open source initiatives (existing projects, special committees and incubating projects) each involving a subset of the members and other contributors. Many individuals contribute to Apache projects on behalf of their employer institutions and companies, but their role and status within ASF depends on their ability to contribute as an individual and the quality and character of their personal contributions. Individuals gain access to contribute to projects and to assume increased responsibility and access permissions based on a proven record as reliable contributors in tune with the culture of ASF and of the project.

ELIXIR is an intergovernmental partnership between 23 member states on behalf of more than 180 of their research organizations. It supports collaboration and joint planning between institutions that share largely similar needs and goals.

GA4GH has a membership comprising 578 diverse organizations that align with the alliance’s goals. These include research infrastructures, hospitals, NGOs, commercial entities and funding bodies.

The vision for an alliance for biodiversity knowledge most closely matches the model followed by GA4GH. Understanding the history and current governance arrangements of GA4GH will certainly assist the biodiversity informatics community with developing the new alliance.

However, the alliance for biodiversity knowledge also needs an inclusive model for individuals to contribute to software development activities and to a wide range of databasing work that depends on the knowledge and skill of individual contributors. These databasing activities include work to develop or maintain key reference datasets (e.g. nomenclatural databases, Catalogue of Life, collection catalogues, gazetteers), work to agree and enhance standards and vocabularies (e.g. Darwin Core, taxon-specific extensions to Darwin Core) and work to review, annotate, curate and correct publicly accessible data. All of these activities will benefit and may depend on the alliance offering a community-based approach to establish trust levels for particular contributors and to give these individuals appropriate recognition for their contributions. A shared culture and reputation model similar to those used by ASF could help the alliance to meet all of these needs.

It therefore seems reasonable for the alliance for biodiversity knowledge to explore two separate but complementary models for its community. On one level, diverse organizations, including universities, natural history collections, research infrastructures, taxonomic societies, intergovernmental partnerships (e.g. CBD, IPBES), NGOs and other stakeholders should be able to contribute to planning, decision-making and funding of the alliance mission. On another level, the alliance should deliver a human network that supports effective cooperation of researchers, software developers, citizen scientists, conservationists and others in developing tools and standards and in contributing and managing data.

Work streams and driver projects

GA4GH developed GA4GH Comment: A 5-year Strategic Plan for the 2017-2022 period. This contains a Vision statement for 2022 and defines six Technical Work Streams and two Foundational Work Streams. These Work Streams each include a Motivation and Mandate and identify Existing Standards and Proposed Solutions. Technical Work Streams deal with technologies, standards and analytic services considered critical and the Foundational Work Streams deal with data security and ethics.

GA4GH partners propose existing or planned investments to be identified as Driver Projects that contribute to the Mandates for the Work Streams. Driver Projects must involve contributions from multiple partners and must each contribute to more than one Work Stream.

Conceptually, the Work Streams and Driver Projects form a matrix of activities, with individual partners contributing to the Driver Projects as a way to advance the goals of the Work Streams. This landscape of activity is represented publicly as a Strategic Roadmap.

The alliance for biodiversity knowledge may benefit from a similar model for community activity around shared needs and goals. A set of work streams could be developed via community consultation based on some or all of the GBIO components. By agreeing a vision and strategic plan for these work streams, the alliance can offer a framework for partners to come together and develop driver projects to advance the shared vision. Such a model may require little central control beyond support for the strategic planning process and clear mechanisms for partners to communicate with one another about the driver projects and to facilitate involvement and contributions by other partners.

Project incubation

ASF supports open source communities as they establish projects. ASF projects follow a standard lifecycle, beginning with the Apache Incubator which ensures that proposed projects align with community practice, particularly around legal standards and guiding principles. New projects (“podlings”) pass through an incubation phase as the project community is developed and demonstrates viability. Those projects that achieve this and are approved within the ASF governance structure graduate to become Apache Projects. Ultimately, these may reach the end of their lifecycle and be archived to the Apache Attic.

The alliance for biodiversity knowledge will benefit from good lifecycle management for projects and other activities carried out in the context of the alliance. An incubation model, allowing credibility and viability to be established, and to ensure communication to and review of the activity by the global community, will increase buy-in and ownership for alliance activities and avoid the mistake of treating untested ideas as though they are already part of the solution.

Core data resources

ELIXIR operates an evaluation process to identify Core Data Resources “of fundamental importance to the wider life-science community and the long-term preservation of biological data”. A 2017 webinar explained the selection process and initial outcomes. Relevant services are nominated by ELIXIR members and then evaluated by the community.

The output from this process is a list of data resources considered critical for the work of the life-science community. Designation of a data resource as critical does not immediately lead to additional or more sustainable funding, but clearly acts as a signal to governments and funding bodies of the wide importance attached to the resource.

A similar model, operating via community review and consensus, could help the alliance for biodiversity knowledge to identify and highlight data resources, services, tools and other components that are critical to the operation of the distributed global infrastructure for biodiversity informatics. These are the components that most require a clearly documented sustainability strategy and coordinated efforts to secure stable funding.