*Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space *

We show that an attention-based machine translation model - Molecular Transformer - tackles both reaction prediction and retrosynthesis by learning from the same dataset. Moreover, a model trained on publicly available data is able to make accurate predictions on proprietary molecules extracted from pharma electronic lab notebooks, demonstrating generalisability across chemical space.

*Molecular Transformer: A model for uncertainty-calibrated chemical reaction prediction*

Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. We treat reaction prediction as a machine translation problem between the "language" of reactants-reagents and the "language" of products. We develop the Molecular Transformer model, which outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct.

*Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning*

Deep learning models are increasingly the method of choice in molecular properties prediction. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: models need to reliably predict when it will fail. We developed a methodology based on Bayesian deep learning that robustly estimates model uncertainty and enables active learning starting from the low data limit.

* Ligand biological activity predicted by cleaning positive and negative chemical correlations*

Predicting ligand biological activity is a key challenge in drug discovery. Although there is an increasing amount of activity data, a data-driven approach needs to overcome the challenge that the number of molecules known to be active or inactive is vastly less than the number of possible chemical features that might determine binding. We develop a framework using random matrix theory that discovers important chemical features by disentangling undersampling noise. This method is used to prospectively discover four experimentally confirmed agonists of the human muscarinic acetylcholine receptor M1, a target for diseases such as Alzheimer’s disease and schizophrenia. Crucially, our method is interpretable and yields prospectively validated chemical insights on the binding modes of the M1 receptor.

*Inverse Ising inference by combining Ornstein-Zernike theory with deep learning*

Given samples, can we infer what is the underlying probability distribution? This question underlies many bioinformatics and chemoinformatics problems, and yet it is computationally intractable. We show that there is a link between a class of inverse statistical problems and the Ornstein-Zernike theory in liquid state physics. Using this insight, we develop a simple algorithm that resolves the computational bottleneck. Our algorithm achieves state-of-the-art results in predicting the fitness landscape of HIV Gag protein as well as predicting protein-ligand affinity.

*Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning*

Finding parameters that minimise a loss function is at the core of many machine learning methods. The Stochastic Gradient Descent (SGD) algorithm is widely used and delivers state-of-the-art results for many problems. Nonetheless, SGD typically cannot find the global minimum, thus its empirical effectiveness is mysterious. We show that SGD converges to a Langevin equation with anisotropic noise, and the noise biases it towards wide minima. We show that wide minima are more Bayes optimal that narrow minima in the overparameterized limit, analogous to entropy winning out over energy in the high temperature regime.

*Optimal design of experiments by combining coarse and fine data*

In many contexts it is extremely costly to acquire high quality experimental measurements, yet it is much easier to carry out experiments that indicate whether a particular sample is above or below a given threshold. We derive an intuitive strategy, inspired by statistical physics, that combines both forms of measurements to yield accurate models.