Visualizing Errors#

This tutorial covers how to visualize the confidence intervals predicted from error models.

import olorenchemengine as oce
import pandas as pd
import numpy as np

dataset = oce.load('lipophilicity_dataset.oce')
model = oce.load('lipophilicity_model_rf.oce')

We start by building and fitting an error model using the training and validation dataset, respectively.

error_model = oce.kNNwRMSD1()
error_model.build(model, dataset.train_dataset[0], dataset.train_dataset[1])
error_model.fit(dataset.valid_dataset[0], dataset.valid_dataset[1], quantile=0.8)
420it [00:02, 176.82it/s]
../_images/2D_Visualizing_Errors_5_1.png

We are now ready to predict errors.

pred = model.predict(dataset.test_dataset[0])
err = error_model.score(dataset.test_dataset[0])
420it [00:02, 179.40it/s]

Visualizations can be performed with the VisualizeError class, a BaseVisualization object. We must input the reference dataset, the predicted value, and the error value.

vis = oce.VisualizeError(dataset, pred[0], err[0])
vis.render_ipynb()

The green plot is a density plot of outputs from the reference dataset. The probable property value range for the target molecule is shaded in purple, and the predicted property value is the purple dividing line.

You can optionally choose to omit the boxplot and/or the points displayed below the density plot.

vis = oce.VisualizeError(dataset, pred[0], err[0], box=False, points=False)
vis.render_ipynb()