Contributing to the Database
A user account is needed to contribute to the database. No user account is needed to view or download data from the database. To sign up for a free account, select the log in link at the top of any page then follow the link to the registration page.
Adding a Compound Family
To add a compound family, select the 'Add' button in the top menu of any page then select 'Add a Compound Family' from the list of links.
First, enter the name of the family you wish to add. If the name is similar in spelling to another family in the database, a checkbox list will appear and ask you if you want to proceed with the current name, or if the compound family you are trying to add is already present in the database under an alternate spelling.
On the 'Pathway Type' screen, you can select the that pathways that are used to produce the compound.
The 'Select Image' screen allows you to query the ChemSpider database for a image of the structure. Alternately, an image can be uploaded or generated from a SMILES string.
On the 'Synonyms' screen, it is possible to add synonyms for the compound family. The ChemSpider database can be queried to view a list of possible synonyms from which you can choose those that are applicable.
On the 'Relationships' screen, related compound families can be selected.
The compound family has now been added. You can view its details page, add a cluster for that family, or restart the process to add a new family!
Adding a Cluster
To add a cluster, select the Add button at the top of any page then select the 'Add Cluster' link. Choose the cluster's compound family from the dropdown list. If the compound family is not present in the list, add it then come back to add the cluster. Enter a valid GenBank/RefSeq ID in the textbox - only GenBank/RefSeq DNA sequence records will be accepted. Finally, if the cluster is located within a large sequence, select the 'Partial Sequence' radio button and indicate the start and end positions of the cluster in the file.
Adding a Non-Cluster Sequence
Large sequences, such as genomes, can be submitted to the database for analysis by antiSMASH after which the PKS/NRPS domains will be extracted and saved into the sequence repository. All that is needed is a valid GenBank ID for GenBank DNA record.
Sequences can be downloaded from several different places:
- Sequence Repository Downloads Page
Sequences that match a variety of criteria can be downloaded in bulk from the Sequence Repository Details Page. Users can choose to filter the sequences by phylum, pathway type(s), domain type, and domain characteristics (for A/AT and KR domains). For pathway types, there is a choice to 'Match All' or 'Match Any'. The 'Match All' option will only return sequences from clusters that match all of the pathway types selected while the 'Match Any' will return sequences that come from clusters that match any of the select pathway types. All of the sequences are output in a FASTA file. Please note that it may take 15-20 minutes before the file is ready to download.
- Cluster Details Page
Sequences for a given cluster can be downloaded from its Details page. The sequences for each domain can be downloaded individually or all of the domains in the cluster can be downloaded together in a zip file.
- Sequence Repository Details Page
Sequences can also be downloaded from the Sequence Repository Details Page. This page lists all of the clusters with sequences in the Sequence Repository. These include clusters that are not in the main database as they were extracted from large sequences such as genomes and are therefore not linked to a compound family. The download options are similar to the Cluster Details page. Individual domains can be downloaded or all of the extracted domains in the cluster can be downloaded at once in a zip file.