Turning high-throughput structural biology into predictive inhibitor design
The significant increase in the throughput of structural biology provides an exciting source of data that characterizes protein–ligand interactions. We develop a new machine learning framework that extracts physically meaningful descriptors of protein–ligand complexes and relates them to bioactivity. Our framework is validated prospectively in the design of SARS-CoV-2 Mpro inhibitors.
Data-Driven Discovery of Molecular Photoswitches with Multioutput Gaussian Processes
Photoswitchable molecules display two or more isomeric forms that may be accessed using light. These molecules are used in applications such as energy storage, molecular electronics, and therapeutics. A key challenge is designing them to switch at desired wavelengths. We developed a new machine learning framework that predicts properties of photoswitches, and prospectively discovered novel molecules that satisfy a set of challenging photophysical profiles.
Quantitative interpretation explains machine learning models for chemical reaction prediction and uncovers bias
Although a plethora of machine learning models have been proposed in the literature for chemical reaction prediction, they suffer from being opaque black-boxes. We develop a framework to attribute predicted reaction outcomes both to specific parts of reactants, and to reactions in the training set. Using this framework, we identify ”Clever Hans” predictions where the correct prediction is reached for the wrong reason due to dataset bias.
Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space
We show that an attention-based machine translation model - Molecular Transformer - tackles both reaction prediction and retrosynthesis by learning from the same dataset. Moreover, a model trained on publicly available data is able to make accurate predictions on proprietary molecules extracted from pharma electronic lab notebooks, demonstrating generalisability across chemical space.
Molecular Transformer: A model for uncertainty-calibrated chemical reaction prediction
Organic synthesis is one of the key stumbling blocks in medicinal chemistry. We treat reaction prediction as a machine translation problem between the "language" of reactants-reagents and the "language" of products. This outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct.
Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning
Deep learning models are increasingly the method of choice in molecular properties prediction. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: models need to reliably predict when it will fail. We developed a methodology based on Bayesian deep learning that robustly estimates model uncertainty and enables active learning starting from the low data limit.
Ligand biological activity predicted by cleaning positive and negative chemical correlations
Predicting ligand biological activity is a key challenge in drug discovery. A data-driven approach needs to overcome the challenge that the number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We develop a framework using random matrix theory that discovers important chemical features by disentangling undersampling noise. This method is used to prospectively discover four experimentally confirmed agonists of the human muscarinic acetylcholine receptor M1, a target for diseases such as Alzheimer’s disease and schizophrenia. Crucially, our method is interpretable and yields prospectively validated chemical insights on the binding modes of the M1 receptor.