Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias

Although a plethora of machine learning models have been proposed in the literature for chemical reaction prediction, they suffer from being opaque black-boxes.  We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Using this framework, we identify ”Clever Hans” predictions where the correct prediction is reached for the wrong reason due to dataset bias. 

Crowdsourcing drug discovery for pandemics

We launch the COVID Moonshot initiative, an open-science initiative that aims to discovery oral antiviral against SARS-CoV-2 targeting the main protease. In less than a year, we have gone from a fragment screen to lead compound showing potent antiviral activity and favourable ADMET/selectivity profile.

Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space

We show that an attention-based machine translation model - Molecular Transformer - tackles both reaction prediction and retrosynthesis by learning from the same dataset. Moreover, a model trained on publicly available data is able to make accurate predictions on proprietary molecules extracted from pharma electronic lab notebooks, demonstrating generalisability across chemical space. 

Molecular Transformer: A model for uncertainty-calibrated chemical reaction prediction

Organic synthesis is one of the key stumbling blocks in medicinal chemistry. We treat reaction prediction as a machine translation problem between the "language" of reactants-reagents and the "language" of products. This outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. 

Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning

Deep learning models are increasingly the method of choice in molecular properties prediction. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: models need to reliably predict when it will fail. We developed a methodology based on Bayesian deep learning that robustly estimates model uncertainty and enables active learning starting from the low data limit. 

Ligand biological activity predicted by cleaning positive and negative chemical correlations

Predicting ligand biological activity is a key challenge in drug discovery. A data-driven approach needs to overcome the challenge that the number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We develop a framework using random matrix theory that discovers important chemical features by disentangling undersampling noise. This method is used to prospectively discover four experimentally confirmed agonists of the human muscarinic acetylcholine receptor M1, a target for diseases such as Alzheimer’s disease and schizophrenia. Crucially, our method is interpretable and yields prospectively validated chemical insights on the binding modes of the M1 receptor.

This site was last updated on 24th Mar 2021