Accepted Input Formats

Prev

3dpredict/Ab currently supports 3 different input formats, fasta, Excel and PDB.
All three are described in more detail in this section of the documentation.

Fasta

The fasta file should contain a header section starting with a '>' followed by the molecule name and a designation of heavy or light chain. The molecule identifier should not contain a | or a _ character. 3dpredict/Ab will

Valid molecule ID/name

Antibodyname1-project272

Invalid molecule ID/name

Antibody_Name_1_project272

Fv/Fab/Ig

>IDNumber1_VL
QIVLTQSPAIMSASPGEKVTMTCSASEQSVSFMNWYKQKSGTSPKRWTYDTSDLASGVPA
RFGGSGSGTSYSLTISSMEAEDAATYYCQQWSNKSHVFGSGTKLEINR
>IDNumber1_VH
EVQLQQSGAELARPGASVKMSCKASGYEFDSYTMHWVKQRPGQGLEWIGYIRPQSGYTHY
NQKLKDKAKLTADKSSSSAYMQLSSLTHEDSAVYYCARSQDFDDGFAYWGQGTLVTVSS

In the example above you define a molecule with IDNumber1 as molecule ID and _VL / _VH designate the chain type. The header section is followed by a newline character and then the sequence can be specified on a single line or with a fixed width on multiple lines.

Other valid header formats are for example:

>IDNumber1_Light
...
>IDNumber1_Heavy
...

or :

>IDNumber1|Light
...
>IDNumber1|Heavy
...

If you specify Ig as antibody format in the application you only need to provide a single instance of the heavy & light chain sequences. 3dpredict/Ab will model the other arm using the same sequences.

Heavy & Light chain

Specifying which sequence is heavy & light is not required. 3dpredict/Ab figures this out.

scFv

These antibody formats only have a single chain so here again you can specify a fasta header with a molecule identifier for instance:

>IdNumber1
DIQLTQSPASLAVSLGQRATISCKASQSVDYDGDSYLNWYQQIPGQPPKLLIYDASNLVS
GIPPRFSGSGSGTDFTLNIHPVEKVDAATYHCQQSTEDPWTFGGGTKLEIKGGGGSGGGG
SGGGGSQVQLQQSGAELVRPGSSVKISCKASGYAFSSYWMNWVKQRPGQGLEWIGQIWPG
DGDTNYNGKFKGKATLTADESSSTAYMQLSSLASEDSAVYFCARRETTTVGRYYYAMDYW
GQGTTVTVSS

VHH

VHH sequences follow the same principles as scFv's, a single chain sequence is provided such as:

EVQLQQSGAELARPGASVKMSCKASGYEFDSYTMHWVKQRPGQGLEWIGYIRPQSGYTHY
NQKLKDKAKLTADKSSSSAYMQLSSLTHEDSAVYYCARSQDFDDGFAYWGQGTLVTVSS

Only the section before the first pipe or underscore character will be used as molecular identifier here.

Bispecifics

In order to predict a structure & properties for a bispecific antibody you need to provide the heavy & light chain sequences of both arms of the antibody. In total you need to provide 4 sequences.

>IDNumber1_VL
DIQMTQSPSSLSASVGDRVTITCKASAAVGTYVAWYQQKPGKAPKLLIYSASYRKRGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCHQYYTYPLFTFGQGTKLEIK
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS
>IDNumber1_VL
QAVVTQEPSLTVSPGGTVTLTCGSSTGAVTTSNYANWVQEKPGQAFRGLIGGTNKRAPGTPARFSGSLLGGKAALTLSGAQPEDEAEYYCALWYSNLWVFGGGTKLTVL
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS

The couple of sequences A & B are the heavy & light chain pairs of the first arm, whereas the couple C & D are the pair of the second arm. Instead of A, B, C & D you can also specify H1, L1 and H2,L2 for heavy and light chains 1 and 2, or VL, VH like in the example above, but you have to make sure that both sequences of one arm follow each other.

Invalid format examples

>IDNumber1_VL
DIQMTQSPSSLSASVGDRVTITCKASAAVGTYVAWYQQKPGKAPKLLIYSASYRKRGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCHQYYTYPLFTFGQGTKLEIK
>IDNumber1_VL
QAVVTQEPSLTVSPGGTVTLTCGSSTGAVTTSNYANWVQEKPGQAFRGLIGGTNKRAPGTPARFSGSLLGGKAALTLSGAQPEDEAEYYCALWYSN
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS
LWVFGGGTKLTVL
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS

Excel

Instead of using fasta files, you can also use Excel as file format to upload your sequences. Here the molecule ID and sequence information can be provided as dedicated columns. Each column must have a line for the column name and the columns should be ordered for now providing the molecule ID in the first column, the heavy chain and light chain in the following columns and repeating heavy & light chain in the upcoming columns for bispecifics for instance.

As a result a valid Excel spreadsheet for a sample Fv might look like this:

Molecule VH VL
MoleculeID1 QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS DIELTQSPASLSASVGETVTITCQASENIYSYLAWHQQKQGKSPQLLVYNAKTLAGGVSSRFSGSGSGTHFSLKIKSLQPEDFGIYYCQHHYGILPTFGGGTKLEIK
MoleculeID2 QVQLQQSGGELAKPGASVKVSCKASGYTFSSFWMHWVRQAPGQGLEWIGYINPRSGYTEYNEIFRDKATMTTDTSTSTAYMELSSLRSEDTAVYYCASFLGRGAMDYWGQGTTVTVSS DIQMTQSPSSLSASVGDRVTITCRASQDISNYLAWYQQKPGKAPKLLIYYTSKIHSGVPSRFSGSGSGTDYTFTISSLQPEDIATYYCQQGNTFPYTFGQGTKVEIK
Bispecifics

Same as in the previous paragraph, for bispecifics make sure to pair your heavy and light chains in successive columns, so 3dpredict/Ab knows which arm contains which sequence pair.

PDB

3dpredict/Ab allows also to upload structures in PDB, mmCIF and moe format (as well as the corresponding gzipped formats).

Clean up your structures

The structure should not contain any ligand and solvent molecules.