3dpredict/Ab currently supports 3 different input formats, fasta, Excel and PDB.
All three are described in more detail in this section of the documentation.
Fasta
The fasta file should contain a header section starting with a '>' followed by the molecule name and a designation of heavy or light chain. The molecule identifier should not contain a | or a _ character. 3dpredict/Ab will
Valid molecule ID/name
Antibodyname1-project272
Invalid molecule ID/name
Antibody_Name_1_project272
Fv/Fab/Ig
>IDNumber1_VL
QIVLTQSPAIMSASPGEKVTMTCSASEQSVSFMNWYKQKSGTSPKRWTYDTSDLASGVPA
RFGGSGSGTSYSLTISSMEAEDAATYYCQQWSNKSHVFGSGTKLEINR
>IDNumber1_VH
EVQLQQSGAELARPGASVKMSCKASGYEFDSYTMHWVKQRPGQGLEWIGYIRPQSGYTHY
NQKLKDKAKLTADKSSSSAYMQLSSLTHEDSAVYYCARSQDFDDGFAYWGQGTLVTVSS
In the example above you define a molecule with IDNumber1 as molecule ID and _VL / _VH designate the chain type. The header section is followed by a newline character and then the sequence can be specified on a single line or with a fixed width on multiple lines.
Other valid header formats are for example:
>IDNumber1_Light
...
>IDNumber1_Heavy
...
or :
>IDNumber1|Light
...
>IDNumber1|Heavy
...
If you specify Ig as antibody format in the application you only need to provide a single instance of the heavy & light chain sequences. 3dpredict/Ab will model the other arm using the same sequences.
Specifying which sequence is heavy & light is not required. 3dpredict/Ab figures this out.
scFv
These antibody formats only have a single chain so here again you can specify a fasta header with a molecule identifier for instance:
>IdNumber1
DIQLTQSPASLAVSLGQRATISCKASQSVDYDGDSYLNWYQQIPGQPPKLLIYDASNLVS
GIPPRFSGSGSGTDFTLNIHPVEKVDAATYHCQQSTEDPWTFGGGTKLEIKGGGGSGGGG
SGGGGSQVQLQQSGAELVRPGSSVKISCKASGYAFSSYWMNWVKQRPGQGLEWIGQIWPG
DGDTNYNGKFKGKATLTADESSSTAYMQLSSLASEDSAVYFCARRETTTVGRYYYAMDYW
GQGTTVTVSS
VHH
VHH sequences follow the same principles as scFv's, a single chain sequence is provided such as:
EVQLQQSGAELARPGASVKMSCKASGYEFDSYTMHWVKQRPGQGLEWIGYIRPQSGYTHY
NQKLKDKAKLTADKSSSSAYMQLSSLTHEDSAVYYCARSQDFDDGFAYWGQGTLVTVSS
Only the section before the first pipe or underscore character will be used as molecular identifier here.
Bispecifics
In order to predict a structure & properties for a bispecific antibody you need to provide the heavy & light chain sequences of both arms of the antibody. In total you need to provide 4 sequences.
>IDNumber1_VL
DIQMTQSPSSLSASVGDRVTITCKASAAVGTYVAWYQQKPGKAPKLLIYSASYRKRGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCHQYYTYPLFTFGQGTKLEIK
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS
>IDNumber1_VL
QAVVTQEPSLTVSPGGTVTLTCGSSTGAVTTSNYANWVQEKPGQAFRGLIGGTNKRAPGTPARFSGSLLGGKAALTLSGAQPEDEAEYYCALWYSNLWVFGGGTKLTVL
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS
The couple of sequences A & B are the heavy & light chain pairs of the first arm, whereas the couple C & D are the pair of the second arm. Instead of A, B, C & D you can also specify H1, L1 and H2,L2 for heavy and light chains 1 and 2, or VL, VH like in the example above, but you have to make sure that both sequences of one arm follow each other.
Invalid format examples
>IDNumber1_VL
DIQMTQSPSSLSASVGDRVTITCKASAAVGTYVAWYQQKPGKAPKLLIYSASYRKRGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCHQYYTYPLFTFGQGTKLEIK
>IDNumber1_VL
QAVVTQEPSLTVSPGGTVTLTCGSSTGAVTTSNYANWVQEKPGQAFRGLIGGTNKRAPGTPARFSGSLLGGKAALTLSGAQPEDEAEYYCALWYSN
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS
LWVFGGGTKLTVL
>IDNumber1_VH
QVQLVQSGAEVKKPGASVKVSCKASGYTFTEFGMNWVRQAPGQGLEWMGWINTKTGEATYVEEFKGRVTFTTDTSTSTAYMELRSLRSDDTAVYYCARWDFAYYVEAMDYWGQGTTVTVSS
Excel
Instead of using fasta files, you can also use Excel as file format to upload your sequences. Here the molecule ID and sequence information can be provided as dedicated columns. Each column must have a line for the column name and the columns should be ordered for now providing the molecule ID in the first column, the heavy chain and light chain in the following columns and repeating heavy & light chain in the upcoming columns for bispecifics for instance.
As a result a valid Excel spreadsheet for a sample Fv might look like this:
Molecule | VH | VL |
---|---|---|
MoleculeID1 | QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS | DIELTQSPASLSASVGETVTITCQASENIYSYLAWHQQKQGKSPQLLVYNAKTLAGGVSSRFSGSGSGTHFSLKIKSLQPEDFGIYYCQHHYGILPTFGGGTKLEIK |
MoleculeID2 | QVQLQQSGGELAKPGASVKVSCKASGYTFSSFWMHWVRQAPGQGLEWIGYINPRSGYTEYNEIFRDKATMTTDTSTSTAYMELSSLRSEDTAVYYCASFLGRGAMDYWGQGTTVTVSS | DIQMTQSPSSLSASVGDRVTITCRASQDISNYLAWYQQKPGKAPKLLIYYTSKIHSGVPSRFSGSGSGTDYTFTISSLQPEDIATYYCQQGNTFPYTFGQGTKVEIK |
Same as in the previous paragraph, for bispecifics make sure to pair your heavy and light chains in successive columns, so 3dpredict/Ab knows which arm contains which sequence pair.
PDB
3dpredict/Ab allows also to upload structures in PDB, mmCIF and moe format (as well as the corresponding gzipped formats).
The structure should not contain any ligand and solvent molecules.