Upload a new Dataset from File

Prev Next

Intro

This tutorial is a quickstart guide showing how to upload a new compound dataset based on a SDF file. We'll use a demo dataset (BI - HSD17B13 DATASET) and walk you through the steps required get the dataset uploaded and ready to use for creating SAR Reports.

Step 1 - File Upload

From the home page of the application, click on “New Dataset” from the main page.

Click on the SD File item (or drag and drop your file in the area) to upload your file.

image.png

A default name will be given to the dataset, feel free to edit it along with the optional description.

Once done, click on Next.

Step 2- Property configuration

In this step, you'll set up properties that will be used by default in the SAR Slides reports. This includes defining how identifiers are displayed in your input dataset, as well as selecting and configuring the default properties that will appear in the SAR Slides.

image.png

Configure compound identifiers

The first thing to do is to provide information on how compound identifiers are defined in your dataset.

image.png

Three options here:

  • Property: identifiers are stored as a regular SDF property.
  • SDF Header: identifiers are stored in the header (first line) of each SDF entry.
  • Autogenerate: use this option if you don't have a unique identifier in your input file.

In this tutorial, you can keep the default value.

Duplicates and missing values

When multiple compounds have the same identifier, we don't discard them. Instead, we add numbers to the initial identifiers (e.g. MOL1 and MOL1 (2)).

When an identifier value is blank, we will assign an automatically generated identifier. A warning will be shown if you use a property that has some missing values.

Duplicate structures will be kept.

Configure properties

Next, you'll need to select one or several properties. The selected properties are the ones that will be shown by default for new SAR Slides, so you want here to select important ones (typically endpoints used to assess your targeted compound profile). Note that all the properties avaialble in the SDF will be imported in the background, so that you can still change your selection later on, once a SAR Slides is created.

To get started, click on the Default Properties select item and pick the Enzymatic HSD17B13 pIC50 column:
image.png

It will be added to the list of properties. For each property you will select, you'll have to define some extra information:

image.png

  • Property Name: This is the name of the property, that you can eventually alias using the edit button next to it. Bellow is displayed the nature of the property as automatically detected (that you can change as ell) along with the number of valid and non-empty values.
  • Direction: Are you trying to optimize this property towards high or low values ? This information is currently used to determine whether a change between two compound is good or bad, which, together with the threshold, will translate into colored values reflecting the effect of a given transformation on the endpoint.
  • Threshold Type: Thresholds are used to identify significant activity change between two molecules. Significant changes will be colored in red if the change is bad, green if the change is good (depending on the directionality you defined).
    Two types of thresholds can be used:
    • Absolute: Use absolute value of the difference between activities.
    • Fold: Use ratio between the reference activity value and other compound value. Suited for non-linear activity scales, typically, IC50s.
  • Change Threshold: Value used to determine whether a change between two property values is significant or not.
  • Main ?: Defines the main property, or primary endpoint. This property will be used to sort fragments in SAR Galleries, which translates into, by default, show the "best" transformations first.
Example

You have a pIC50 evaluating inhibition effect of your compound on your primary target. You would typically define the direction as Higher, the threshold type as "Absolute" and the change threshold = 1 (which corresponds to a 10-folds change), and toggle the Main radio button for this property.

Note that all this information can then be changed once in the SAR Slides. Defining some relevant default configuration will however save you time.

Configure the form as follows:

image.png

Submit and next steps

Once you are all set, click on the Submit button at the bottom right. You should reach a page showing the progress of the import process. Note that the import process takes from few seconds to couple of minutes depending on the size of your dataset and the nature of your molecules.

image.png

Once done, you can proceed to create a new SAR Slide, or to review your dataset content. Feel free to follow the tutorial explaining how to create a SAR Slide from a query compound.

Thanks!