Documentation Index

Fetch the complete documentation index at: https://docs.discngine.com/llms.txt

Use this file to discover all available pages before exploring further.

Upload a new Dataset from File

Prev Next

Intro

This tutorial is a quickstart guide showing how to upload a new compound dataset based on a SDF file. We'll use a demo dataset (BI - HSD17B13 DATASET{target=_blank}) and walk you through the steps required get the dataset uploaded and ready to use for creating SAR Reports{target=_blank}.

Step 1 - File Upload

From the home page of the application, click on “New Dataset” from the main page.

Click on the SD File item (or drag and drop your file in the area) to upload your file.

image.png

A default name will be given to the dataset, feel free to edit it along with the optional description.

Once done, click on Next.

Upload limits

Dataset upload is limited to files with maximum of 200Mb or 20k compounds.

Step 2- Property configuration

During this step, you'll set up properties that will be used by default in the SAR Slides reports created on this dataset. This includes defining identifiers of your compounds, and configuring properties shown by default.

Why this configuration step ? While it can be intimidating, taking few seconds to perform this pre-configuration step will have the following advantages in the current version of the application:
  • The main endpoint definition will be used in various places of the application to save you clicks: it will be used in particular to show you the best transformation / R-Groups by default and propose you relevant coloring schemes.
  • Other properties selected here will always be shown by default in every new SAR Slides. If you don't select any, you'll have to add them again and again for every new SAR Slide.
  • The main endpoint threshold (see below) is also used to generate suggested starting points when creating SAR Reports. If it is not defined, this aspect will be ignored.
We are currently working on ways to streamline these configuration aspects (templates, ...). Until then, this step remains relevant for the above-mentioned reasons.

image.png

Configure compound identifiers

The first thing to do is to provide information on how compound identifiers are defined in your dataset.

image.png

Three options are available:

  • Property: identifiers are stored as a regular SDF property.
  • SDF Header: identifiers are stored in the header (first line) of each molecular entry of the SDF file.
  • Autogenerate: use this option if you don't have already a unique identifier in your input file.

In this tutorial, you can keep the default value.

Duplicates and missing values

When multiple compounds have the same identifier, we don't discard them. Instead, we add numbers to the initial identifiers (e.g. MOL1 and MOL1 (2)).

When an identifier value is blank, we will assign an automatically generated identifier. A warning will be shown if you use a property that has some missing values.

Duplicate structures are kept.

Configure properties

Next, you'll need to select one or several properties and pre-configure them. The goal with the pre-configuration step is to define what will be shown by default in every new SAR Slides created for this dataset. However, please note that all the properties available in the SDF will be imported anyways, so that you can still change your selection later on, once a SAR Slides is created.

  • Click on the Default Properties select item and pick the Enzymatic HSD17B13 pIC50 column:
    image.png

It will be added to the list of properties. For each property you can (optionally) define some extra information. You can read an in-depth explanation on these aspect{target=_blank} in a companion article. Here, we'll AIM at the following end result:

By default, every new MMP SAR Slide created on this dataset will show the best transformations regarding our main pIC50 endpoint, show additional endpoints that are of interest for this dataset, and highlight transformations that have a significant effect in green (good effect) or red (bad effect) for each property.

  • Configure the form as follows:

image.png

  • Once you are all set, click on the Submit button at the bottom right to trigger the import process.

Next steps

You should reach a page showing the progress of the import process. Note that the import process takes from few seconds to couple of minutes depending on the size of your dataset and the nature of your molecules.

image.png

Once done, you can proceed to create a new SAR Slide, or to review your dataset content. Feel free to follow the tutorial explaining how to create a SAR Slide from a query compound.

Thanks!