Frequently Asked Questions (FAQ)

Can’t find your question below? For questions about how to use Pathoplexus, or what certain terms mean, please see our Documentation. For questions about our governance, please see our Governance pages. If you still can’t find an answer, feel free to reach out to us at hello@pathoplexus.org!

Questions about Pathoplexus

What makes Pathoplexus different?

Pathoplexus offers flexible data-sharing options: users can choose to share their data openly, or with time-limited protections to help ensure proper attribution and credit. Pathoplexus integrates smoothly with existing INSDC-member databases (NCBI, ENA, and DDBJ), enabling data that’s ‘open’ on Pathoplexus to also appear on INSDC, and INSDC data to be accessed through Pathoplexus. Pathoplexus is built on the latest tools for filtering, searching, and accessing data, making data sharing and analysis more accessible (through both the website and API) and fostering a connected, collaborative global research community.

How can I get involved?

To become part of the scientific community that helps drive Pathoplexus, you can join PHA4GE and be part of the Data Repositories Working Group. You can also contribute code and feature suggestions to Loculus, the custom-built open-source software that powers Pathoplexus.

Who is behind Pathoplexus? Who funds it?

Pathoplexus is a transparent, non-profit association with members from 14 countries in 5 continents, and an Executive Board from around the globe. Pathoplexus’ members and Executive Board are committed to running Pathoplexus according to our Values. Pathoplexus is proud to work closely with PHA4GE, an international group working to establish better standards for public health and bioinformatics. PHA4GE was also key in helping to develop Pathoplexus from the ground up.

Pathoplexus is a community-driven project. Its initial development was powered almost entirely by donated time and computational resources from the pathogen genomics community. Although we have now begun to receive grants and donations, we are still heavily reliant on voluntary contributions of time and code (see our members and development team).

The software that underpins Pathoplexus, Loculus, is an open-source academic project. You find out more about Loculus, as well as how it is funded, here.

How does Pathoplexus fit in among existing pathogen sequence sharing databases?

Pathoplexus is designed to be complementary to the existing pathogen sequence database ecosystem. Data from Pathoplexus, as long as it is used and acknowledged according to the Data Use Terms, can be analyzed alongside any other dataset (if the other dataset also permits this), and always retains a link to the original source (where applicable), so that mixing data and deduplication are fully supported.

In addition, Pathoplexus is purposefully complementary to INSDC-member databases, as all data in Pathoplexus eventually goes to INSDC. For Open Data submitted to Pathoplexus, this means Pathoplexus can be used as simply another way to send data to INSDC, as it is passed on immediately. For Restricted-Use Data submitted to Pathoplexus, Pathoplexus serves as a ‘temporary protected home’ before it eventually becomes Open.

Pathoplexus sequences are annotated with cross-references to the corresponding INSDC and GISAID accessions if available.

What is Pathoplexus doing about issues around pathogen access and benefits?

While developing Pathoplexus, the issue of countries and regions sharing sequences but not receiving an equitable share of the benefit that can be derived from those sequences (e.g. vaccines), was a topic we discussed deeply. This is also a topic that’s currently under debate globally, with efforts to develop pathogen access and benefits sharing (“PABS”) agreements. After much consideration, we don’t feel a singular database is the place to try and fix this inequity - but we do want to be part of the eventual solution. This is why we commit to adhering to future consensus-driven international PABS agreements.

How can I try Pathoplexus out?

If you don’t have sequences to upload for the pathogens we currently support - or just want to try out Pathoplexus before deciding if you want to use it - you can always use our Demo Instance! Our Demo Instance works just like the ‘real’ Pathoplexus, but is wiped regularly and no data is sent onward to INSDC.

It’s perfect for trying out Pathoplexus or testing your API requests. Do note that since it is wiped regularly, you will have to make a new account and group - but it’s ok if these aren’t as detailed as your ‘real’ accounts. Remember that the Demo Instance is public, so don’t upload data that you can’t or don’t want to share. If you’d like to try out Pathoplexus but don’t have any data to hand, you can use our example data for the pathogens we support!

(See our Docs for more information on how to do things like create an account, upload sequences, and more in Pathoplexus!)

Questions about the pathogens we support

I’d like my virus of interest to be on Pathoplexus, how can I ask for it to be added?

If communities that work on a particular virus believe it would be helpful to add this to Pathoplexus, we’d love to hear about it! We’re keen to add viruses while working with those who study those viruses, so that we can ensure it’s of maximum value to the community. We may not have the resources to add additional viruses immediately, so we ask for your patience while we try to get funding to build up and support our development.

However, we’re still keen to build a list of viruses that the community is keen to see on Pathoplexus. Please search our GitHub Issues to see if anyone has already proposed the virus you’d like to suggest - if so, comment to support their proposal! If not, please create a new issue, outlining why you think it would be a great addition, and if possible, listing others in the community who support adding that virus!

How do you choose the pathogens to include?

In future, we aim to prioritize viruses that have a high public health interest and currently have a less-than-ideal sequence sharing situation, for any reason. For example, the community may not be sharing much data because of fear of ‘scooping,’ or they may find uploading the data too difficult. Alternatively, it could currently be fragmented and shared in different places, and Pathoplexus could be a way to bring it together.

We are also keen to add pathogens where there’s support in that pathogen community - where the community feels like having the pathogen on Pathoplexus will be a benefit.

Finally, we will also consider the technical difficulty of including a new virus in prioritization. For example, multi-segmented viruses require more work to ensure we’re matching up segments correctly, and some viruses may be more difficult to write robust quality-control metrics for. However, none of this rules out adding a new virus completely - it may just have to wait a bit longer until we have sufficient resources!

Questions about data

Can I use the data on Pathoplexus?

Yes! Pathoplexus is designed to be used by everyone, and so all data is accessible. However, Pathoplexus does have restrictions on how some data (“Restricted-Use Data”) can be used, particularly in publications and preprints, and has requirements on how all Restricted-Use Data is acknowledged.

You can find out more about these protections and how you can use data by reading our Data Use Terms. We also have summaries on how you can use Open Data and Restricted-Use Data.

What is Pathoplexus’s stance on commercial use?

There are ongoing international negotiations on establishing a pathogen access and benefit sharing system (PABS), which Pathoplexus has engaged with, but which have not yet concluded. Until such a system exists, Pathoplexus leaves questions of commercial use to existing regulations and legislation.

Pathoplexus’s Restricted Use terms are designed to regulate use in academic publications and preprints, to ensure that submitters feel secure that their academic priority to publish will be respected and that they will not be “scooped”. In this model, all sequences are accessible to everyone, but some have time-limited conditions on how they can be used academically. We believe that this open-access, restricted-use model could serve as inspiration for a future PABS system where sequences are accessible publicly, but with any use for the development of medical countermeasures conferring benefit-sharing obligations.

The current Pathoplexus Restricted Use terms do not address commercial PABS matters, and are not intended as a solution to this issue.

How can I contribute data to Pathoplexus?

We’ve tried to make sharing your data as easy and flexible as possible!

You can upload data to any of our supported pathogens by first creating an account and then submitting your sequences on the website or via the API (useful for computational pipelines). At submission, you can choose whether you’d like your data to be protected for up to one year, or open immediately. Once the data is open, it also appears on INSDC-member databases.

Can I contribute my wastewater sequencing data to Pathoplexus?

Pathoplexus currently does not accept uploads of raw sequencing reads. We recommend submitting your raw reads to one of the INSDC databases. However, you may submit consensus sequences derived from wastewater sequencing data to Pathoplexus.

If you do, please enter waste water (which corresponds to [ENVO:00002001]) in the environmentalMaterial metadata field when submitting your data.

For some organisms on Pathoplexus, the host metadata field is required. Since wastewater samples do not have a clearly defined host organism, please enter unidentified in the host field for organisms where host is required.

Where does my data go when I submit it to Pathoplexus?

When you submit your data to Pathoplexus, it gets securely stored in our database, hosted on AWS in Europe (under GDPR).

If you’ve chosen for your data to be open straight away, it will be submitted to the European Nucleotide Archive (ENA). It can take up to a week for data to appear on ENA, due to processing delays on their side.

It will then be synchronised across all INSDC-member databases (i.e. GenBank and DDBJ) in a short time, and will continue to be available on Pathoplexus.

If you’ve selected the Restricted-Use data terms, your data will not be submitted to the INSDC until it becomes Open.

Should I submit my data to both INSDC-member databases (Genbank, ENA, etc) and Pathoplexus?

No, you should not submit your data to both INSDC and Pathoplexus, as it may result in your data being duplicated in both places. If you submit to INSDC, we will pull your data into Pathoplexus, so there’s no need to submit it here! If you submit to Pathoplexus, the data will go to INSDC when you specify (and immediately, if you select the data is open), so there’s no need to upload it to INSDC yourself - we’ll take care of that!

Since we keep a record of all the data we pass onto INSDC, we ensure we don’t duplicate it - but we can’t do this if users upload to both places separately!

I originally submitted my data to GISAID. Can I now submit it to Pathoplexus as well?

Yes, you can, as long as you have not shared your sequences to INSDC databases (Genbank, ENA, DDBJ). In contrast to Pathoplexus, GISAID does not submit data to the INSDC on your behalf. So unless you yourself submitted your sequences to the INSDC, you can submit them to Pathoplexus. To ensure data integrity, we encourage you to add your sequences’ GISAID Isolate ID (EPI_ISL) to the gisaidIsolateId metadata field when you submit your sequences to Pathoplexus.

How is data use restricted in Pathoplexus?

When you submit your data to Pathoplexus, you have the option to restrict how it can be used for a limited time, or make it fully open straight away. If you choose to keep your data restricted in how it can be used, it will have these protections for up to a year, giving you time to publish your research. After this period, or if you choose to share it openly immediately, your data will be released on international databases (INSDC-member databases).

If you want to use data from Pathoplexus, it’s critical you familiarize yourself with our Data Use Terms, so you know how you can use sequences and how you must acknowledge them.

Where does Pathoplexus get its data?

We get our data two ways: ingesting (‘pulling’) open data from INSDC-member databases, and the data the Pathoplexus users upload to us directly.

Pathoplexus ingests data from INSDC-member databases (specifically, from NCBI Datasets) for all the Pathogens it supports. We do this automatically at regular intervals, but always preserve the link back to the INSDC source. You can easily tell if a sequence originated from INSDC if the ‘Submitting group’ is Automated Ingest from INSDC/NCBI Virus.

Users can also submit data to us directly, which we eventually pass on to the INSDC network. All Open data is directly submitted, and Restricted data will be submitted after the Restricted-Use period lapses and it becomes Open. You can easily tell if Open data was submitted to us directly by seeing if the ‘Submitting group’ is anyone other than Automated Ingest from INSDC/NCBI Virus.

Questions about our code

What is the code underlying Pathoplexus? What is Loculus?

Pathoplexus is an instance of the broader pathogen data sequence sharing software Loculus. This means that Pathoplexus runs on Loculus code, with specific features, personalization, and most importantly, surrounding governance, that make it ‘Pathoplexus.’ All of the Pathoplexus code is open-source and you can view it here.

Loculus was designed at the same time as Pathoplexus, but is intended to be a flexible, customizable generic pathogen sequence-sharing database. For example, a lab might use Loculus to store the samples they sequence locally and be able to easily search and access them, or a university may have a Loculus instance to gather all the sequences they generate together in one place. Alternatively, someone could create another Loculus instance to serve bacterial pathogens, much like Pathoplexus!

All of the Loculus code is also open-source, and you can view it here.

Questions about our website

Where do the images of pathogens on the front page come from?

We’re incredibly grateful to NIAID for providing fantastic images of pathogens. You can check out their incredible Flickr account to see more great images.

The images we use from NIAID are:

CCHF
Dengue (credit shared with the CDC)
Ebola Sudan
Ebola Zaire
RSV
Measles

The images we use from the CDC are:

We are grateful to use this West Nile Virus image from Cynthia Goldsmith at USCDCP.

We are grateful to use this HMPV image from Paul Chan.

We are grateful to use this Andes Virus image, with light alterations to remove arrows and regrade colors.