Summaries of our Data Use Terms for Open Data and Restricted-Use Data are also available. Note that only the full Data Use Terms are used to interpret and arbitrate use.
These Terms were updated on 4 Aug 2024 and are the current and valid version.
These Terms constitute the legal agreement between you (a “User”) and Pathoplexus (“we”, “us”, “our”) (of Pathoplexus, Basel-Stadt, Switzerland), governing how Restricted-Use Data can be used and shared onward, as well as how it must be attributed when used.
This document is governed by the Pathoplexus Values and should be interpreted in light of the Pathoplexus Values. The Executive Board can modify and make changes to this document, in line with the purpose and commitments of the Pathoplexus Values, via 2/3 majority vote of the entire Board. If the Board has 5 members, this is interpreted as 4/5 votes in favor.
Pathoplexus aims to encourage data sharing by providing multiple options for sequence sharing, with submitters choosing whether to immediately provide data under open conditions, or to stipulate that its use is restricted for a period of time to mitigate concerns of “scooping” and use without appropriate acknowledgement. Pathoplexus believes that ethical use of all data is critical. This data is only available due to the hard work of data generators. Rapid data sharing, which allows the rapid assessment of pathogen characteristics and dynamics, will only be possible if trust is maintained that shared data will be used and acknowledged appropriately. By engaging in considerate, ethical, and fair use of the data shared with the community, Pathoplexus users can play an active role in building a community that fosters more data sharing.
As used in Data Use Terms, the following terms shall have the following meanings:
1.1 The term “User” shall mean everyone who accesses the web service (https://pathoplexus.org) in any form
1.2 The term “Submitter shall mean those who submit data to Pathoplexus
1.3 The Term “Submitting Group” shall mean a group that a Submitter has submitted sequences on behalf of, and has control over all sequences submitted on behalf of that Submitting Group
1.4 The term “Curator” shall mean those who have elevated access in order to help detect and correct errors in the data
1.5 The term “SeqSet” shall mean collections of sequences indicated by their accession numbers, which provide unique identifiers that can be used to reference that set of sequences.
Users can submit sequences for supported pathogens with associated, non-sensitive metadata, (see the metadata fields we accept, and what data is sensitive) either without use restrictions (herein called “Open Data” for brevity) or with use restrictions (“Restricted-Use Data”) - both types of data together are referred to here as “Pathoplexus Data” or the “Data”.
The Data are made available to all Users, at no cost, on condition of acceptance of the Data Use Terms (upon accessing the Data in any form, Users agree to Data Use Terms. The Data Use Terms are further explicitly linked to within the metadata.
Restricted-Use Data can only be used within the conditions of the Data Use Terms (see below). Open Data is not subject to these terms, but should still be used ethically: data generators should be acknowledged and collaborations should be sought in some circumstances (see Open Data below). Users are required to read the Data Use Terms in detail and note applicable restrictions and expectations of notification, offers of collaboration, and acknowledgement that should be followed.
Open Data submitted to Pathoplexus is immediately submitted to INSDC on behalf of the original Submitters, where it becomes additionally available on all INSDC platforms.
Restricted-Use Data is displayed in Pathoplexus with a clear indication that its use is restricted, and is held under embargo at INSDC so that it is not accessible through the INSDC databases until the expiration of the Restricted-Use period of up to one year. Immediate embargoed submissions to INSDC allows Submitters to obtain accession numbers to be used in publication, whilst keeping their data subject to the Pathoplexus terms of use. After the Restricted Use period ends, Restricted-Use Data becomes Open Data.
Pathoplexus expects correct acknowledgement and crediting of Open Data, via SeqSets and DOIs at a minimum, and through collaboration and co-authorship with sequence Submitters where appropriate. In particular, all efforts should be made to avoid “scooping” others’ work. This may include situations where you use extensive data from a country or region without involving and including any authors from that region or publishing analyses that may preclude the original data generators from publishing on their own data.
Publications and preprints using any form of data from Pathoplexus must provide the accession numbers for the sequences used. Pathoplexus strongly encourages using SeqSets (see section 4.4). It is also recommended to additionally list the INSDC accessions, to ensure data can be easily traced on both platforms.
Data from Pathoplexus can be shared onward but the Data Use Terms must be clearly communicated, and any data distributed should include the Data Use Terms columns (dataUseTerms
, dataUseTermsRestrictedUntil
and dataUseTermsUrl
) intact in the metadata. If displayed on a website or in another database, each sequence should have a direct link to the original sequence page on Pathoplexus, and display the Pathoplexus accession or link to an INSDC database and display the INSDC accession, if available on INSDC.
Restricted-Use Data can only be used under the Data Use Terms outlined herein.
Data can remain ‘Restricted-Use’ for up to one year after submission. Submitters and Submitting Groups can set a shorter restricted-use period at submission, or choose to end, or shorten, the Restricted-Use period at any time. After this period ends, the data becomes Open Data.
Data from Pathoplexus can be used for unpublished and un-preprinted work, such as, but not limited to: graphical representations, blog posts, social media, public health and governmental reports, and web programs and applications (in this case, see if 4.2.4 Third Party Data Sharing, below, is applicable).
In scientific publications and preprints, Restricted-Use data can often only be used with explicit permission of the Submitting Group. It is vital that you read the conditions of use below.
Pathoplexus believes it is important that people who generate the sequences have the opportunity to complete and publish the analysis they intend with the context and expertise they possess. Thus, we provide guidelines to prevent Users from “telling the submitters’ story” when using others’ data. To aid in this interpretation we have created the categories of “Focal Set” and “Background Set”.
Users utilizing data from Pathoplexus in publications and preprints must create a SeqSet (see section 4.4) containing the Pathoplexus accession numbers and generate a DOI, dividing their sequence and metadata into two groups:
For more a more detailed description of how to divide data into “Focal” and “Background Sets”, see 4.2.3 ‘Deciding’, below.
The requirements for using Restricted-Use data in a Focal or Background Set differ - please read carefully.
When deciding whether data should be part of the Focal Set, the intent of a Focal Set should be interpreted broadly: this is data without which the work would not be possible. Data that is part of a Focal Set is thus data that is critical to the analysis - it could not be removed or replaced with a randomly selected similar set without changing the results significantly and thus should be acknowledged appropriately.
Focal Set - if you answer yes to any of these, this data should be part of your “Focal Set”:
The intent of data in a Background Set is to provide context to the Focal Set. Including data in the Background Set implies that this data could be replaced with other data to a reasonable degree without impacting the analysis. Any data for which this is not true should be part of the Focal Set.
Background Set - if you answer yes to all of these, the data may be acceptable to include in a “Background Set”:
If unsure whether data should be in the Focal or Background Set, it is best practice to consider the data part of a Focal Set.
Restricted-Use Data from Pathoplexus can be shared onward but the Data Use Terms must be clearly communicated, and any data distributed must include the Data Use Terms columns (dataUseTerms
, dataUseTermsRestrictedUntil
and dataUseTermsUrl
) intact in the metadata. If displayed on a website or in another database, each sequence must have a direct link to the original sequence page on Pathoplexus and display the Pathoplexus accession.
The focus of Pathoplexus is on open availability of data. If Users share Restricted-Use Pathoplexus data onward as a database or as part of a database it must be under the same circumstances as it has been provided to User: without access restrictions. An exception is made for private use for small groups or labs with up to 200 users. If Users wish to use Pathoplexus data in an access-restricted database with more than 200 users, Users must contact Pathoplexus to request permission (help@pathoplexus.org). We generally support onward sharing for collaborative use (such as access within an institution for their employees, or sharing within a collaboration for joint downstream analysis) and encourage you to reach out. Requests will be considered and decided by the Executive Board.
In some cases there are a large number of data submitters that have contributed to the database and in these cases, analyses and applications that use all data from the database can consider this data as “Background Set” without explicit involvement of the submitters. In other cases, where only a small number of groups have contributed, it would not be ethical to use all data without contacting and involving the submitting groups.
If a large number of submitters have contributed to the data, and you can answer yes to one of the scenarios below, it may be appropriate to treat your entire dataset as a “Background Set” and acknowledge it appropriately.
Examples of what would be considered appropriate use of All Data:
Please see our how-to here for how to create a SeqSet.
If you are producing a publication or preprint, you MUST cite the DOI in the References section of your manuscript (as if it was another paper or resource) so that it is documented by CrossRef and the paper can be linked to the sequences used.
(Particularly for Editor, Reviewers, & Readers)
We appreciate and value the efforts of publishers to encourage and promote the ethical behavior of publishing scientists, by checking for any restrictions on data they are publishing, as well as the geographical distribution of the data - particularly for the focal set.
Anyone can easily check a SeqSet by following the pathoplexus.org SeqSet link or the DOI link provided to you.
Editors and reviewers are always encouraged to reach out to authors in order to better understand their choice in sequences, authors, and focal/background split. Having written guidelines guiding data use is relatively new, and many authors may genuinely misunderstand or misinterpret them, and be happy to rectify issues that fall foul of the Data Use Terms.
Things to consider for publishers, reviewers, and others checking DOI sets:
For a Focal Set:
For Background Set: