Representing Molecules#

In order to build molecules on chemical databases, we need to be able to represent chemical structures to our models. olorenchemengine is equipped with a wide array of the most performant representations available in the literature, in addition to our own proprietary representation, called OlorenVec. The olorenchemengine.representations module contains a full list of the available representations.

Transforming Molecules with Representations#

The most basic thing we can do with the molecular representation is transform a molecule into a machine readable format using the representatiion.

acetaminophen = "CC(=O)NC1=CC=C(C=C1)O"
ibuprofen = "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"

olorenvec_repr = oce.OlorenCheckpoint("default")
olorenvec_repr.convert(acetaminophen) # convert a single molecule
olorenvec_repr.convert([acetaminophen, ibuprofen]) # converts multiple molecules

Defining Models with Representations#

All of the models defined in olorenchemengine.basics require a representation to be passed in as an additional input.

For example, this code block generates a RandomForestModel using varying underlying molecular representations.

olorenvec_model = oce.RandomForestModel(oce.OlorenCheckpoint("default"))
rdkit2d_model = oce.RandomForestModel(oce.DescriptastorusDescriptor("rdkit2dnormalized"))
morgan_model = oce.RandomForestModel(oce.DescriptastorusDescriptor("morgan3counts"))

Due to the structure of olorenchemengine, running a hyperparameter sweep over different representations is as simple as writing a for loop:

representations = [oce.OlorenCheckpoint("default"),
                   oce.DescriptastorusDescriptor("rdkit2dnormalized"),
                   oce.DescriptastorusDescriptor("morgan3counts")]

for representation in representations:
    model = oce.RandomForestModel(representation)
    # Evaluate model here...

Concatenating Representations#

If you want to concatenate multiple representations together, you can just add them together!

concatenated_representation = oce.OlorenCheckpoint("default") +
    oce.DescriptastorusDescriptor("rdkit2dnormalized") +
    oce.DescriptastorusDescriptor("morgan3counts")
concatenated_model = oce.RandomForestModel(concatenated_representation)

Defining New Representations#

New representations can be defined by extending the olorenchemengine.representations.BaseCompoundVecRepresentation class:

class MyCustomRepresentation(oce.BaseCompoundVecRepresentation):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def _convert(self, smiles, y=None):
        return vector_representation_of_smiles # should be a numpy array

my_representation = MyCustomRepresentation()
my_model = oce.RandomForestModel(my_representation)

“””