olorenchemengine.visualizations package#

Submodules#

olorenchemengine.visualizations.attribute_section module#

Module containing classes describing available attributes for visualizations.

These attributes are used to standardize the parameters of visualizations to make them more accessible (to the extent possible).

class olorenchemengine.visualizations.attribute_section.AttributeSection(attribute_name: str, attributes: List[BaseAttribute])#

Bases: BaseAttribute

Class to create an attribute section

Parameters:
  • attribute_name (str) – The name of the attribute section

  • attributes (list) – The list of attributes of type BaseAttribute

to_json()#

Convert the attribute to a json object

Parameters:

name (str) – The name of the attribute

Returns:

The json representation of the object

class olorenchemengine.visualizations.attribute_section.BaseAttribute(name: str)#

Bases: ABC

Base class for all attributes for a visualization

Parameters:

name (str) – The name of the attribute

abstract to_json() list#

Convert the attribute to a json object

Parameters:

name (str) – The name of the attribute

Returns:

The json representation of the object

class olorenchemengine.visualizations.attribute_section.ColorPicker(name: str)#

Bases: SimpleAttribute

Class to create a color picker attribute

Parameters:

name (str) – The name of the attribute

attribute_name = 'colorPicker'#
class olorenchemengine.visualizations.attribute_section.DatasetSelector(name: str)#

Bases: SimpleAttribute

Class to create a dataset selector attribute

Parameters:

name (str) – The name of the attribute

attribute_name = 'datasetSelector'#
class olorenchemengine.visualizations.attribute_section.InputNumber(name: str)#

Bases: SimpleAttribute

Class to create an input number attribute

Parameters:

name (str) – The name of the attribute

attribute_name = 'inputNumber'#
class olorenchemengine.visualizations.attribute_section.InputString(name: str)#

Bases: SimpleAttribute

Class to create an input string attribute

Parameters:

name (str) – The name of the attribute

attribute_name = 'inputString'#
class olorenchemengine.visualizations.attribute_section.InputThreshold(name: str, min: int = 0, max: int = 100)#

Bases: BaseAttribute

Class to create an input threshold attribute

Parameters:
  • name (str) – The name of the attribute

  • min (int) – The minimum value of the threshold

  • max (int) – The maximum value of the threshold

to_json()#

Convert the attribute to a json object

Parameters:

name (str) – The name of the attribute

Returns:

The json representation of the object

class olorenchemengine.visualizations.attribute_section.ModelSelector(name: str)#

Bases: SimpleAttribute

Class to create a model selector attribute

Parameters:

name (str) – The name of the attribute

attribute_name = 'modelSelector'#
class olorenchemengine.visualizations.attribute_section.SimpleAttribute(name: str)#

Bases: BaseAttribute

A simple attribute for a visualization is and attribute whose only parameter is a name.

This describes all attributes except for the InputThreshold class, which requires a minimum and maximum value alongside a name.

to_json() list#

Convert the attribute to a json object

Parameters:

name (str) – The name of the attribute

Returns:

The json representation of the object

olorenchemengine.visualizations.compounds module#

for basic analysis of compounds/groups of compounds

class olorenchemengine.visualizations.compounds.VisualizeMCS(smiles: List[str], *args, timeout=30, kekulize=False, completeRingsOnly=True, invert_colors=False, log=True, **kwargs)#

Bases: VisualizeCompounds

Visualize the maximum common substructure between a list of compounds.

Parameters:
  • smiles (List[str]) – the smiles of the compounds to visualize

  • timeout (int) – the timeout for the MCS calculation in seconds

  • invert_colors (bool) – invert the colors of the MCS such that the MCS is red and the differences are green. Default False.

  • completeRingsOnly (bool) – only consider complete rings in the MCS, parameter for rdkit.Chem.rdFMCS.FindMCS. Default True.

property JS_NAME#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

olorenchemengine.visualizations.exploratory_analysis module#

These visualizations are intended for exploratory analysis.

class olorenchemengine.visualizations.exploratory_analysis.VisualizeFeatures(dataset: BaseDataset, features: list = [], log=True, **kwargs)#

Bases: BaseVisualization

Visualize how the selected features are correlated with the dataset.

Parameters:
  • dataset (BaseDataset) – the dataset to analyze

  • features (list) – the features to visualize. For the elements of the feature list, strings will be treated as a column of the dataset and BaseVecRepresentations are used to convert structures to features, the provided representations should have the names property defined for better interpretability. In addition to features thefeature_cols in the dataset are also visualized.

get_data()#

Get data for visualization in JSON-like dictionary.

olorenchemengine.visualizations.matched_pairs module#

class olorenchemengine.visualizations.matched_pairs.MatchedPairsHeatmap(*args, **kwargs)#

Bases: MatchedPairsTable

property JS_NAME#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

get_data()#

Get data for visualization in JSON-like dictionary.

class olorenchemengine.visualizations.matched_pairs.MatchedPairsTable(dataset: BaseDataset, mode: str = 'features', annotations: Union[str, List[str]] = [])#

Bases: BaseVisualization

This visualization is intended to show matched pairs of molecules in a dataset which differ based on a set of feature columns defined in the dataset object.

property JS_NAME#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

get_data()#

Get data for visualization in JSON-like dictionary.

matched_pairs(df, cols, dist=1)#

olorenchemengine.visualizations.model_comparisons module#

Model comparisons is a set of visualizations that compare various models or parameters thereof. This can be used for hyperparameter tuning, benchmarking visualizations, or model selections.

class olorenchemengine.visualizations.model_comparisons.ModelOverTime(dataset: BaseDataset, model: BaseModel, *args, log=True, title='Model Performance over Time', yaxis_title='Error', xaxis_title='Date', **kwargs)#

Bases: CompoundScatterPlot

Visualizes the performance of a given model architecture over a length of time.

This visualization will use a date split and for each date i, will train a model on the data on dates 1, …, i-1, evaluating on date i. Note, any existing splits in the dataset will be ignored

Parameters:
  • dataset (BaseDataset) – The dataset to use for the visualization

  • model (BaseModel) – The model to use for the visualization

get_data()#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.model_comparisons.VisualizationModelManager_Bar(mm: BaseModelManager, top=5, log=True, **kwargs)#

Bases: BaseVisualization

Visualize the database of a model manager in a bar plot, with the model parameters and name displayed on hover

Parameters:
  • mm (ModelManager) – The model manager to visualize

  • top (int) – The number of models to display

property JS_NAME: str#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

get_data(metric=None) dict#

Get the data for the visualization

Parameters:

metric (str) – The metrics to use for the visualization

Returns:

The data for the visualization

Return type:

dict

render_ipynb(metric=None, *args, print_html_js=False, **kwargs) str#

Render the visualization in an ipython notebook

Parameters:
  • metric (str) – The metric to use for the visualization

  • *args – Additional arguments to pass to the visualization

  • print_html_js (bool) – Whether to print the javascript code for the visualization

  • **kwargs – Additional arguments to pass to the visualization

Returns:

The html code for the visualization

Return type:

str

olorenchemengine.visualizations.visualization module#

class olorenchemengine.visualizations.visualization.BaseErrorWaterfall(model: BaseBoosting, x_data: Union[_MockObject.DataFrame, _MockObject.ndarray], *y_data: Union[_MockObject.Series, list, _MockObject.ndarray], normalization=False, log=True, **kwargs)#

Bases: BaseVisualization

Visualize the error waterfall for a base boosting model.

Parameters:
  • model (BaseBoosting) – Model to evaluate on. must be base boosting model.

  • x_data (Union[pd.DataFrame, np.ndarray]) – Data to predict on using the model

  • y_data (Union[pd.Series, list, np.ndarray], optional) – True values to compare to. Defaults to None. If None, then the waterfall plot will be for residuals.

  • normalization (bool, optional) – If the data is normalized. Defaults to False.

get_data() dict#

Get data for visualization in JSON-like dictionary.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.BaseVisualization(log=True, **kwargs)#

Bases: BaseClass

Base class for all visualizations. Each visualization should implement a get_data method that returns a JSON-like dictionary of data to be used in the visualization. Each visualization should also have a corresponding JavaScript file (or one of a parent class) specified in JS_NAME that renders the visualization.

Each visualization class should also implement a get_attributes method that that will return for OAS to use in the visualization page. Through this method, the visualization will specify what sections, attributes, and component types it needs for the user to specify.

property JS_NAME: str#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

classmethod from_attributes(attributes: dict) BaseVisualization#

Create visualization from attributes.

Parameters:
  • attributes (dict) – Attributes to be used in visualization.

  • cls (BaseVisualization) – Class of visualization to be created.

static get_attributes() list#

Send the attributes for OAS visualization through an ordered list in which they will displayed in order.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data() dict#

Get data for visualization in JSON-like dictionary.

get_html(data_str: str, packages: str, **kwargs)#

Returns the HTML code for the visualization.

Parameters:
  • data_str (str) – JSON string of data to be used in visualization.

  • packages (str) – HTML code which loades requisite JavaScript packages specified in self.packages.

get_js(*args, **kwargs) str#

Render visualization to JavaScript string.

Parameters:
  • args (list) – Arguments to be passed to visualization.

  • kwargs (dict) – Keyword arguments to be passed to visualization.

Returns:

JavaScript code for visualization.

Return type:

str

Get link to visualization.

package_urls = {'d3': 'https://d3js.org/d3.v4.js', 'olorenrenderer': 'https://unpkg.com/olorenrenderer@1.0.0-c/dist/oloren-renderer.min.js', 'plotly': 'https://cdn.plot.ly/plotly-2.14.0.min.js', 'rdkit': 'https://unpkg.com/@rdkit/rdkit/dist/RDKit_minimal.js', 'smilesdrawer': 'https://unpkg.com/smiles-drawer@1.0.10/dist/smiles-drawer.min.js'}#
render(data: Optional[dict] = None, print_html_js=False, **kwargs) str#

Render visualization to generate a data_url string.

Parameters:
  • data (dict) – Data to be used in visualization. Optional, if not provided, data will be retrieved from get_data method.

  • print_html_js (bool) – Whether or not to print the JavaScript code for the visualization.

render_data_url(data: Optional[dict] = None, print_html_js=False, **kwargs) str#

Render visualization to data_url string.

Parameters:
  • data (dict) – Data to be used in visualization. Optional, if not provided, data will be retrieved from get_data method.

  • print_html_js (bool) – Whether or not to print the JavaScript code for the visualization.

render_ipynb(data: Optional[dict] = None, print_html_js=False, **kwargs) str#

Render visualization to IPython notebook IFrame.

Parameters:
  • data (dict) – Data to be used in visualization. Optional, if not provided, data will be retrieved from get_data method.

  • print_html_js (bool) – Whether or not to print the JavaScript code for the visualization.

render_oas()#

Render for OAS

save_html(path)#
upload_oas()#
class olorenchemengine.visualizations.visualization.ChemicalSpacePlot(dataset: Union[BaseDataset, list, _MockObject.Series, _MockObject.DataFrame], rep: BaseCompoundVecRepresentation, *args, dim_reduction='tsne', smiles_col=None, title='Chemical Space Plot', log=True, **kwargs)#

Bases: CompoundScatterPlot

Visualize chemical space by calculating a vector representation and performing dimensionality reduction to 2 dimensions.

Parameters:
  • dataset (BaseDataset, pd.Seriess, list) – BaseDataset to be used in visualization. Alternatively can be a list or pd.Series where then this object will be treated as a list of structures.

  • rep (BaseCompoundVecRepresentation) – Representation to use for dimensionality reduction.

  • dim_reduction (str, optional) – Dimensionality reduction method to use. Default is ‘tsne’ other options are ‘pca’.

  • color (str, optional) – Column name to use for color. If it is ‘property_col’ the visualization will use the property_col of the dataset to color the markers. Default is None, meaning no variable coloring for the chemical space plot

  • colorscale (str, optional) – Color scale to use for coloring the chemical space plot. Other options can be found

get_data(color_col: Optional[str] = None, size_col: Optional[str] = None, SMILES: Optional[str] = None) dict#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

pca_df(chem_rep_list)#

Takes in a list containing the chemical representation of a collection of molecules and returns a dataframe containing the PCA reduction to 2 components.

Parameters:

chem_rep_list (list) – List of chemical representations of molecules.

Returns:

Dataframe containing the PCA reduction to 2 components.

Return type:

pandas.DataFrame

tsne_df(chem_rep_list)#

Takes in a list containing the chemical representation of a collection of molecules and returns a dataframe containing the t-SNE reduction to 2 components.

Parameters:

chem_rep_list (list) – List of chemical representations of molecules.

Returns:

Dataframe containing the t-SNE reduction to 2 components.

Return type:

pandas.DataFrame

class olorenchemengine.visualizations.visualization.CompoundScatterPlot(df, title: str = 'Compound Scatter Plot', xaxis_title: str = 'X axis', yaxis_title: str = 'y axis', x_col: str = None, y_col: str = None, smiles_col: str = None, kekulize: bool = True, color_col: str = None, colorscale: str = None, xaxis_type: str = 'linear', yaxis_type: str = 'linear', axesratio: float = None, xdomain: float = None, ydomain: float = None, xdtick: float = None, ydtick: float = None, xrange: float = None, yrange: float = None, opacity: float = 1, width: int = 800, height: int = 600, jitter: float = 0.0, log: bool = True, **kwargs)#

Bases: BaseVisualization

Scatter plot visualization, where molecules are displayed on hover on an x-y plane.

Parameters:
  • df (pd.DataFrame) – Dataframe to be used in visualization. Needs to have X, Y, and SMILES columns.

  • title (str) – Title of the visualization.

  • xaxis_title (str, optional) – Title for x-axis. Default is ‘X axis’.

  • yaxis_title (str, optional) – Title for y-axis. Default is ‘y axis’.

  • x_col (str, optional) – If specified, uses value as the column name for the x-axis value. Default is None.

  • y_col (str, optional) – If specified, uses value as the column name for the y-axis value. Default is None.

  • smiles_col (str, optional) – If specified, uses value as the column name for the molecule smiles. Default is None.

  • kekulize (bool, optional) – Whether or not to kekulize molecules for display. Default is True.

  • color_col (str, optional) – If specified, uses value as the column name for the color of the markers. Default is None.

  • xaxis_type (str, optional) – Type of x-axis. Default is ‘linear’, other options are ‘log’ or ‘date’.

  • yaxis_type (str, optional) – Type of y-axis. Default is ‘linear’, other options are ‘log’ or ‘date’.

  • axesratio (float, optional) – ratio of x-axis to y-axis lengths. Default is None, which allows it to be auto chosen by Plotly.

  • xdomain (float, optional) – domain for x-axis. Default is None, which allows it to be auto chosen by Plotly.

  • ydomain (float, optional) – domain for y-axis. Default is None, which allows it to be auto chosen by Plotly.

  • xdtick (float, optional) – tick interval for the x-axis. Default is None, which allows it to be auto chosen by Plotly.

  • ydtick (float, optional) – tick interval for the y-axis. Default is None, which allows it to be auto chosen by Plotly.

  • xrange (float, optional) – tick interval for the x-axis. Default is None, which allows it to be auto chosen by Plotly.

  • ydtick – tick interval for the y-axis. Default is None, which allows it to be auto chosen by Plotly.

  • xrange – tick interval for the x-axis. Default is None, which allows it to be auto chosen by Plotly.

  • opacity (float, optional) – opacity of the markers. Useful for seeing the distribution of dense data. Default is 1.0

  • width (int, optional) – width of the plot. Default is 600.

  • height (int, optional) – height of the plot. Default is 600.

property JS_NAME: str#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

static get_attributes()#

List of Customizable attributes for Visualization to display.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data(include_data=True) dict#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.ModelPR(dataset: BaseDataset, model: BaseModel, *args, log=True, model_name=None, **kwargs)#

Bases: BaseVisualization

Visualize the Precision-Recall curve for a model.

Parameters:
  • dataset (Dataset) – Dataset to evaluate model on

  • model (BaseModel) – Model to evaluate on

  • model_name (str, optional) – Name of model. Default is None.

  • eval_set (str, optional) – Subset of dataset to visualize, either ‘train’, ‘test’, or ‘valid’. Default is ‘test’.

property JS_NAME: str#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

static get_attributes()#

Send the attributes for OAS visualization through an ordered list in which they will displayed in order.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data() dict#

Get data for visualization in JSON-like dictionary.

class olorenchemengine.visualizations.visualization.ModelROC(dataset: BaseDataset, model: BaseModel, *args, eval_set='test', log=True, model_name=None, **kwargs)#

Bases: BaseVisualization

Visualize the ROC curve for a model.

Parameters:
  • dataset (Dataset) – Dataset to evaluate model on

  • model (BaseModel) – Model to evaluate on

  • model_name (str, optional) – name of model. Default is None.

  • eval_set (str, optional) – Subset of dataset to visualize, either ‘train’, ‘test’, or ‘valid’. Default is ‘test’.

property JS_NAME: str#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

static get_attributes()#

Send the attributes for OAS visualization through an ordered list in which they will displayed in order.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data() dict#

Get data for visualization in JSON-like dictionary.

class olorenchemengine.visualizations.visualization.ModelROCThreshold(dataset: BaseDataset, model: BaseModel, *args, eval_set='test', log=True, model_name=None, **kwargs)#

Bases: BaseVisualization

Visualize the ROC curve for a model.

Parameters:
  • dataset (Dataset) – Dataset to evaluate model on

  • model (BaseModel) – Model to evaluate on

  • model_name (str, optional) – name of model. Default is None.

  • eval_set (str, optional) – Subset of dataset to visualize, either ‘train’, ‘test’, or ‘valid’. Default is ‘test’.

property JS_NAME: str#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

static get_attributes()#

Send the attributes for OAS visualization through an ordered list in which they will displayed in order.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data() dict#

Get data for visualization in JSON-like dictionary.

class olorenchemengine.visualizations.visualization.MorganContributions(smiles: str, dataset: BaseDataset, compound_id: str = 'The compound', radius=2, nbits=1024, *args, **kwargs)#

Bases: BaseVisualization

Visualize a morgan fingerprint bit contributions of a molecule.

Each bit of a morgan fingerprint maps to a specific substructure so this visualization takes in a given compound and a dataset to calibrate on, and visualizes which substructure in the compound most contribute to the predicted property value of the compound.

This is done by looking at the bits which are activated for the given compound, and then systematically switching those bits off and on examining how each bit effects the output of the model.

Parameters:
  • smiles (str) – smiles string of the molecule to visualize.

  • dataset (BaseDataset) – Dataset to calibrate the MorganContributions model on.

  • compound_id (str, optional) –

  • radius (int, optional) – Parameter for the Morgan representation. Default is 2.

  • nbits (int, optional) – Parameter for the Morgan representation. Default is 1024.

property JS_NAME: str#

Name of JavaScript file for visualization, needs to be in the scripts folder.

Returns:

Name of JavaScript file.

Return type:

str

getSubstructSmi(mol, atomID, radius)#
static get_attributes()#

Send the attributes for OAS visualization through an ordered list in which they will displayed in order.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data()#

Returns a dictionary of the data to be visualized.

get_html(data_str: str, packages)#

Returns the HTML code for the visualization.

Parameters:
  • data_str (str) – JSON string of data to be used in visualization.

  • packages (str) – HTML code which loades requisite JavaScript packages specified in self.packages.

get_substructures(bottom_3_bits, top_3_bits)#

Returns a list of the substructures of the molecule.

visualize_morganfp()#

Takes a list of predictions and returns the list of images of the minimum and maximum substructures of switching a bit on or off.

Returns:

Tuple of two lists of images. (minimum effect, maximum effect)

class olorenchemengine.visualizations.visualization.ScatterPlot(df, log=True, **kwargs)#

Bases: BaseVisualization

Scatter plot visualization.

Parameters:

df (pd.DataFrame) – Dataframe to be used in visualization.

get_data() dict#

Get data for visualization in JSON-like dictionary.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.VisualizeADAN(dataset: BaseDataset, model: BaseModel, *args, rep: BaseCompoundVecRepresentation = None, dim_reduction: str = 'pls', explvar: float = 0.8, threshold: float = 0.95, **kwargs)#

Bases: CompoundScatterPlot

Visualize a model trained on a dataset, by seeing predicted vs true colored by ADAN criteria.

Parameters:
  • dataset (Dataset) – Dataset to visualize.

  • model (BaseModel) – Model to use for prediction.

  • rep (BaseStructVecRepresentation) – Representation to use for dimensionality reduction.

  • threshold (float) – Threshold for ADAN visualization between 0.0 and 1.0. Roughly a higher value corresponds to a more strict threshold resulting in more compounds being marked as out-of-distribution for the ADAN criteria.

static get_attributes()#

List of Customizable attributes for Visualization to display.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data(criterion='B') dict#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.VisualizeCompounds(dataset: Union[BaseDataset, list, _MockObject.Series, str, _MockObject.Chem.Mol], table_width: int = 1, table_height: int = 5, compound_width: int = 500, compound_height: int = 500, highlights=[], annotations=None, kekulize=True, box=False, shuffle=False, log=True, **kwargs)#

Bases: BaseVisualization

Visualizes the compounds in a set.

Parameters:
  • dataset (Union[BaseDataset, list, pd.Series]) – Compounds to be visualized. BaseDataset is preferred but list and pd.Series will be converted to lists of structures to be rendered.

  • table_width (int, optional) – Number of compounds in table row. Defaults to 2.

  • table_height (int) – Number of rows in table

  • compound_width (int) – Width of each compound image in the table

  • 250 (compound_height =) – Height of each compound image in the table

  • None (annotations =) – Name of columns in dataset to be used to annotated the table. Will only be considered if dataset is a BaseDataset.

  • True (shuffle =) – Whether or not to kekulize molecules for rendering. Default is True.

  • highlights (list) – List of indices of atoms to highlight in the compounds. Format is a list of lists of the form [Atom Map Number, Color (hex)]. Default is [].

  • True

  • True – Whether or not to shuffle the compounds.

get_data()#

Get data for visualization in JSON-like dictionary.

class olorenchemengine.visualizations.visualization.VisualizeCounterfactual(smiles: str, model: BaseModel, perturbation_engine: PerturbationEngine = None, delta: Union[int, float, Tuple] = (-1, 1), n: int = 40, pca: bool = False, **kwargs)#

Bases: CompoundScatterPlot

Visualize a model’s counterfactuals in the chemical space around a given compound, plotted in Tanimoto similarity space

Parameters:
  • smiles (str) – Given compound to visualize counterfactuals for.

  • model (BaseModel) – Model to evaluate counterfactuals on.

  • perturbation_engine (PerturbationEngine) – Engine to generate counterfactuals with. Defauult is SwapMutations with radius 1.

  • delta (int, float, or tuple, option) – Margin that defines counterfactuals for regression models. Default is (-1,1).

  • n (int, optional) – Number of counterfactuals to generate. Default is 40.

get_data() dict#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.VisualizeDatasetCV(*args, division: str = 'cv', **kwargs)#

Bases: VisualizeDatasetDivision

Visualize the dataset colored by the cross validation split.

Wraps VisualizeDatasetDivision

class olorenchemengine.visualizations.visualization.VisualizeDatasetCompounds(dataset: Union[BaseDataset, list, _MockObject.Series, str, _MockObject.Chem.Mol], table_width: int = 1, table_height: int = 5, compound_width: int = 500, compound_height: int = 500, highlights=[], annotations=None, kekulize=True, box=False, shuffle=False, log=True, **kwargs)#

Bases: VisualizeCompounds

Alias for VisualizeCompounds

class olorenchemengine.visualizations.visualization.VisualizeDatasetDivision(dataset: BaseDataset, rep: BaseCompoundVecRepresentation, model: BaseModel = None, res_lim: float = None, opacity: float = 0.4, title=None, colors=['#4B27E8', '#E88233', '#E8CE05', '#10CDE8', '#BC4DD6', '#D6B258', '#91D62D', '#3862D6'], division: str = 'split', *args, **kwargs)#

Bases: ChemicalSpacePlot

Visualize a dataset by seeing where train/test compounds are in a dimensionality reduced space by coloring the compounds by whether or not they are in train/test.

Parameters:
  • dataset (BaseDataset) – Dataset to visualize.

  • rep (BaseStructVecRepresentation) – Representation to use for dimensionality reduction.

  • model (BaseModel, optional) – Model to use for prediction in order to color the dataset split by residual value.

  • colorscale (str) – Color scale to use for coloring the compounds.

  • res_lim (int) – Capping the visualized residual size to the specified value.

  • opacity (float) – Opacity of the markers.

  • colors (List[str]) – List of colors to use for coloring the compounds.

  • division (str) – Whether to use the split or cv column in the dataset to visualize the dataset division.

classmethod from_attributes(attributes: dict) BaseVisualization#

Replace the dataset with a dataframe with a SMILES column and construct the rep object

static get_attributes()#

List of Customizable attributes for Visualization to display.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data() dict#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.VisualizeDatasetSplit(*args, division: str = 'split', **kwargs)#

Bases: VisualizeDatasetDivision

Visualize the dataset colored by the train/test/val split.

Wraps VisualizeDatasetDivision

class olorenchemengine.visualizations.visualization.VisualizeError(dataset: Union[BaseDataset, list, _MockObject.Series, _MockObject.ndarray], value: Union[int, float, _MockObject.ndarray], error: Union[int, float, _MockObject.ndarray], true=None, ci=None, box=False, points=True, title='Error Plot', xaxis_title=None, yaxis_title=None, width=800, height=600, **kwargs)#

Bases: BaseVisualization

Visualizes property values and errors for a compound

Parameters:
  • dataset (Union[BaseDataset, list, pd.Series, np.ndarray]) – reference data to be plotted in the visualization

  • value (Union[int, float, np.ndarray]) – single property value for the target compound

  • error (Union[int, float, np.ndarray]) – single error value for the target compound

  • box (bool) – whether or not to include a boxplot in the visualization

  • points (bool) – whether or not to include the reference points in the visualization

  • xaxis_title (str) – Title for x-axis. If not set, will try to use the property column name if dataset is a BaseDataset.

  • yaxis_title (str) – Title for y-axis.

get_data()#

Get data for visualization in JSON-like dictionary.

class olorenchemengine.visualizations.visualization.VisualizeModelSim(dataset: BaseDataset, model: BaseModel, eval_set: str = 'test', mode: str = 'none', *args, log=True, **kwargs)#

Bases: CompoundScatterPlot

Visualize a model’s predicted vs true plot on given dataset, where each point is colored by a compounds similarity to the train set

Parameters:
  • dataset (BaseDataset) – Dataset to visualize model performance on, the visualization will only select the set specified by eval_set.

  • model (BaseModel) – Model to visualize.

  • eval_set (str) – Subset of dataset to visualize, either ‘train’, ‘test’, or ‘valid’.

  • x_transform (str) – Transform to apply to x-axis, either ‘quantile’ or ‘none’.

static get_attributes()#

List of Customizable attributes for Visualization to display.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data() dict#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.VisualizeModelSim2(dataset: BaseDataset, model: BaseModel, eval_set='test', var_name='sim', *args, log=True, **kwargs)#

Bases: CompoundScatterPlot

Visualize the connection between a model’s error on a given compound and specific variables.

Parameters:
  • dataset (Dataset) – Dataset to evaluate model on.

  • model (BaseModel) – Model to evaluate on.

  • eval_set (str) – Subset of dataset to visualize, either ‘train’, ‘test’, or ‘valid’. Default is ‘test’

  • var_name (str) – Which variable to put on the x-axis. Options are ‘sim’ the similarity of a compound to the train set, ‘prop’ the true property value, and ‘pred’ the predicted property value. Default is ‘sim’.

static get_attributes()#

List of Customizable attributes for Visualization to display.

Returns:

List of attributes for OAS to display.

Return type:

list

get_data() dict#

Get data for visualization in JSON-like dictionary.

Parameters:

include_data (bool) – Whether or not to include data in the visualization. If False, only the attributes will be returned.

Returns:

Data for visualization.

Return type:

dict

class olorenchemengine.visualizations.visualization.VisualizeMoleculePerturbations(smiles: str, perturbation_engine: PerturbationEngine = None, rep: BaseVecRepresentation = None, idx: int = None, n: int = None)#

Bases: ChemicalSpacePlot

Visualize perturbations of a single molecule given from a PerturbationEngine in a ChemicalSpacePlot.

Parameters:
  • smiles (str) – SMILES of molecule to perturb.

  • perturbation_engine (PerturbationEngine) – Perturbation engine, which has the underlying algorithm for perturbing molecules. Default is SwapMutations(radius = 0)

  • rep (BaseVecRepresentation) – Molecular vector representation to use for dimensionality reduction

olorenchemengine.visualizations.visualization.get_oas_compatible_visualizations()#

Returns a list of all available visualizations that are compatible with OAS, meaning that they can be created and specified in OAS. All visualizations can be shared via OAS.

olorenchemengine.visualizations.visualize_interpret module#

Visualize interpret is for visualizations that help explain why a model makes a certain prediction.

class olorenchemengine.visualizations.visualize_interpret.VisualizePredictionSensitivity(model: BaseModel, query_compound: str, radius: int = 2, n: int = 200, colorscale='viridis', bottom_quantile=0.75, top_quantile=0.95, nbins=3, log=True, **kwargs)#

Bases: BaseVisualization

Visualize the sensitivity of a model’s prediction to atom-level perturbations generated by the SwapMutations perturbations engine.

Parameters:
  • model (BaseModel) – The model to interpret.

  • query_compound (str) – The smiles string of the compound to interpret.

  • radius (int) – The radius of the perturbations to generate with SwapMutations

  • n (int) – The number of perturbations to generate centerred aroudn each atom.

get_data()#

Get data for visualization in JSON-like dictionary.

Module contents#