INSDC Submission

Open sequences will be made publicly available through the INSDC databases (ENA, DDBJ, NCBI). To accomplish this, we will upload sequences on your behalf to ENA. After successful submission to ENA (this can take up to 48hours) we will return the accessions of your sequences in ENA.

When submitting to ENA we use the institution on your group page as your center name, center name is used by ENA as an identifier to facilitate the recognition and attribution of your sequences within the INSDC.

We urge users to not upload their sequences independently to INSDC to prevent data duplication. However, if at any point in time Pathoplexus no longer exists and you need to modify your data you can use the center name to identify your group and request sequence revision.

ENA Submission

In order to submit your sequences to ENA we need to create a Project, Sample and Assembly on your behalf, see ENA’s metadata model for more information.

In ENA, Projects contain general information on your group and the organism being sequenced. We create one Project per each group and organism. In ENA Samples contain metadata information and Assemblies contain the actual sequences. We create one sample and one assembly object per sequence.

Citing your Sequences

If you would like to cite your sequences in a publication you can use your Bioproject accession (this will start with PRJ), Biosample accession (this will start with SAM) and Genome Assembly accession (starting with GCA).

Mapping of Pathoplexus Metadata Fields to ENA Metadata Fields

To facilitate data standardization we map our metadata to ENA’s ENA virus pathogen reporting standard checklist, using PHA4GE’s official mapping.

ENA Sample-related FieldsLoculus Fields
subject exposureexposureEvent
type exposureexposureEvent
hospitalisationhostHealthState==Hospital
illness symptomssignsAndSymptoms
collection datesampleCollectionDate
geographic location (country and/or sea)geoLocCountry
geographic location (region and locality)geoLocAdmin1
sample capture statuspurposeOfSampling
host disease outcomehostHealthOutcome
host common namehostNameCommon
host agehostAge
host health statehostHealthState
host sexhostGender
host scientific namehostNameScientific
isolatespecimenCollectorSampleId
collecting institutionsequencedByOrganization, authorAffiliations
receipt datereceived date
isolation source host-associatedanatomical material, anatomical part, body product
isolation source non-host-associatedenvironmental site, environmental material
authorsauthors
ENA Assembly-related FieldsLoculus Fields
ASSEMBLY_TYPEdefault=ISOLATE
PROGRAMsequencingInstrument, default=Unknown
PLATFORMsequencingProtocol, default=Unknown
COVERAGEdepthOfCoverage, default=1
MOLECULETYPENaN