Intro
This tutorial is a quickstart guide showing how to upload a new compound dataset based on a SDF file. We'll use a demo dataset (BI - HSD17B13 DATASET) and walk you through the steps required get the dataset uploaded and ready to use for creating SAR Reports.
Step 1 - File Upload
From the home page of the application, click on “New Dataset” from the main page.
Click on the SD File item (or drag and drop your file in the area) to upload your file.
A default name will be given to the dataset, feel free to edit it along with the optional description.
Once done, click on Next.
Step 2- Property configuration
During this step, you'll set up properties that will be used by default in the SAR Slides reports created on this dataset. This includes defining identifiers of your compounds, and configuring properties shown by default.
Configure compound identifiers
The first thing to do is to provide information on how compound identifiers are defined in your dataset.
Three options here:
- Property: identifiers are stored as a regular SDF property.
- SDF Header: identifiers are stored in the header (first line) of each SDF entry.
- Autogenerate: use this option if you don't have a unique identifier in your input file.
In this tutorial, you can keep the default value.
When multiple compounds have the same identifier, we don't discard them. Instead, we add numbers to the initial identifiers (e.g. MOL1 and MOL1 (2)).
When an identifier value is blank, we will assign an automatically generated identifier. A warning will be shown if you use a property that has some missing values.
Duplicate structures will be kept.
Configure properties
Next, you'll need to select one or several properties and pre-configure them. Note that all the properties available in the SDF will be imported anyways, so that you can still change your selection later on, once a SAR Slides is created.
The intent behind this pre-configuration step is to define what will be shown by default in every new SAR Slides created for this dataset. The example provided here illustrates the following end results:
"By default, every new MMP SAR Slide created on this dataset will show the best transformations regarding our main pIC50 endpoint, highlight transformations that have a significant effect in green or red, and show additional endpoints that are of interest for this dataset."
To get started, click on the Default Properties select item and pick the Enzymatic HSD17B13 pIC50 column:
It will be added to the list of properties. For each property you can (optionally) define some extra information, keeping in mind the intent described before:
- Property Name: You can eventually alias it using the edit button next to it. Bellow is displayed the type of the property as automatically detected (that you can change as well) along with the number of valid and non-empty values.
- Direction: Are you trying to optimize this property towards high or low values ? This information is currently used with MMPs SAR Slides to (1) order transformations to show the best ones by default, (2) determine whether an activity change between two compound is good or bad, which, together with the threshold (see bellow), will translate into colored values reflecting the effect of a given transformation on the endpoint.
- Threshold Type: Thresholds are used to identify significant activity change between two molecules. Significant changes will be colored in green if the change is good, or red if the change is bad, depending on the directionality you defined.
Two types of thresholds can be used:- Absolute: Use absolute value of the difference between activities.
- Fold: Use ratio between the other compound value and the reference activity value. Suited for non-linear activity scales, typically, IC50s.
- Change Threshold: Value used to determine whether a change between two property values is significant or not.
- Main ?: Defines the main property, or primary endpoint. This property will be used by default to sort fragments in SAR Galleries, which translates into, show the "best" transformations first.
You have a pIC50 evaluating inhibition effect of your compound on your primary target. You would typically define the direction as Higher, the threshold type as "Absolute" and the change threshold = 1 (which corresponds to a 10-folds change), and toggle the Main radio button for this property.
Check out what would happen by taking compound 27 (pIC50 = 6.93) as a reference compound:
We defined that higher values are good, with a threshold of 1. Since compound 38 has a pIC50 of 8.05, and compound 27 (reference) has a pIC50 of 6.93, the difference between these two is +1.12, which is greater than the threshold. The value for compound 38 ends up being colored in green: this is a good, significant (according to the threshold) change.
Note that instead of displaying pIC50 values, we can display activity differences instead, which makes it easier to interpret the default coloring:
If you edit the threshold value and put 2 instead, you would end up with no coloring in this example, since there is no absolute activity change greater or equals that 2:
Note that all this information can then be changed once in the SAR Slides. Defining some relevant default configuration will however save you time, since once done, it will be applied to every new SAR Report created on the dataset.
Configure the form as follows:
Submit and next steps
Once you are all set, click on the Submit button at the bottom right. You should reach a page showing the progress of the import process. Note that the import process takes from few seconds to couple of minutes depending on the size of your dataset and the nature of your molecules.
Once done, you can proceed to create a new SAR Slide, or to review your dataset content. Feel free to follow the tutorial explaining how to create a SAR Slide from a query compound.
Thanks!