Citing Data from Pathoplexus

Thank you for using Pathoplexus data! Proper citation protects the hard work of data generators and helps sustain a culture of open, rapid sharing. The requirements depend on whether you’re using Open Data, Restricted-Use Data, or all available data for a virus.

Not sure which applies to you? Each sequence on Pathoplexus displays its data use status clearly.

Open Data

Open Data can be used freely. While it is not subject to defined terms of use, we strongly encourage good scientific practice:

  • Always list the accession numbers for sequences used in your work.
  • Create a SeqSet and generate a DOI — this is the best way to permanently and precisely cite the data you used. (How to create a SeqSet)
  • Collaborate with data generators, especially when using extensive data from a specific region or country.

Data sharing depends on trust. Even for Open Data, please use it ethically and credit those who generated it.

Restricted-Use Data

Restricted-Use Data has binding requirements. Using it means you’ve agreed to the Data Use Terms.

For unpublished work (reports, blog posts, social media, web tools)

  • Provide a list of accession numbers used, or create a SeqSet (recommended if using more than 5 sequences).
  • Link back to Pathoplexus so that submitters can be identified and credited.

For publications and preprints

You must first determine whether your Restricted-Use Data is part of the Focal data or the Background data of your analysis. These are defined in the Data Use Terms (section 4.2.3) — please read carefully, and when in doubt, treat data as Focal.

If Restricted-Use Data is in your Focal Set, you must do all three of the following:

  1. Include authorship from the Submitting Group — or obtain and include a written Authorship Waiver as a supplemental document.
  2. Create a SeqSet, generate a DOI, and cite the DOI as a reference in your manuscript. (How to create a SeqSet)
  3. Add to your Acknowledgements: “We confirm that we have adhered to the Data Use Terms of Pathoplexus.”

If Restricted-Use Data is in your Background Set, you must:

  • Create a SeqSet, generate a DOI, and cite it as a reference in your manuscript.

Using All Data (e.g. global analyses or real-time dashboards)

If you’re using the entire Pathoplexus dataset for a broad global analysis (e.g., tracking global mutation frequencies with no regional focus), the full dataset may be treated as a Background Set. (See the Data Use Terms, section 4.3 for more details.)

For a publication or preprint: Create a SeqSet, generate a DOI, and cite it as a reference.

For a website or real-time application: A SeqSet is sufficient — you don’t need a DOI, since the data on a live site will change over time. Update your SeqSet regularly to reflect the data being used.

Quick reference

SituationSeqSetDOIAuthorship
Open Data (any use)EncouragedEncouragedConsider collaborating
Restricted-Use, Background Set, publicationRequiredRequiredNot required
Restricted-Use, Focal Set, publicationRequiredRequiredRequired (or waiver)
All Data, publicationRequiredRequiredNot required
All Data, live websiteRequiredNot requiredN/A

How to Create a SeqSet

Full instructions are in our documentation: Generating & using SeqSets

Key point: For publications and preprints, the DOI must be cited in your References section (as you would cite a paper), not just mentioned in the text. This ensures it is indexed by CrossRef and linked to your manuscript.

Questions? Contact us at help@pathoplexus.org or consult the full Data Use Terms.