1D - BasicVis#

This notebook contains an index of common visualizations implemented in Oloren Chem Engine.

First, we’ll create example model, dataset, and representation objects for use in creating this index.

import olorenchemengine as oce

# We'll use a dataset of BACE (beta secretase enzyme) pIC50 values for this
# from a collection put together by MoleculeNet
dataset = oce.BACEDataset() + oce.ScaffoldSplit()
model = oce.BaseBoosting([oce.RandomForestModel(oce.DescriptastorusDescriptor("morgan3counts")),
Using backend: pytorch

Visualizing the chemical space of a dataset#

Useful for exploratory data analysis, e.g. seeing if most compounds are similar or different, if there are clear subsets of compounds etc.

vis = oce.ChemicalSpacePlot(dataset, oce.DescriptastorusDescriptor("morgan3counts"), opacity = 0.4, dim_reduction = "tsne")

Visualizing model errors by cluster#

There is an additional visualization which does the chemical space plot and then graphs onto it a dataset’s train/test split via the marker outlines as well as the magnitude of the residuals via the size of the markers. This is useful for examining model errors and discering potential patterns across cores.

vis = oce.VisualizeDatasetSplit(dataset, oce.DescriptastorusDescriptor("morgan3counts"), model = model, opacity = 0.4)

VisualizeFeatures - exploratory data analysis (EDA) tool#

This tool allows for the visualization of property correlation with given properties. It is compatible with all representations that have names defined for each feature. An example is below correlating various features provided in the LipinskiDescriptor set from RDKit with the pIC50 value.

Often times, like below, there aren’t any strong correlations but weak correlations can help provide insight into overall trends and can be combined into stronger overall models. Other interesting ways to examine these graphs are to examine subsets of the plot (by hovering to see the molecular structure) and seeing how those subsets vary or do not vary with changes in property value or feature value. This can help determine what is not important as well.

vis = oce.VisualizeFeatures(dataset, features = [oce.LipinskiDescriptor()])