Polymer design using genetic algorithm and machine learning

Data driven or machine learning (ML) based methods have been recently used in materials science to provide quick material property predictions. Although powerful and robust, these predictive models are still limited in terms of their applicability towards the design of materials with target property or performance objectives.

Here, they employ a nature-mimicking optimization method, the genetic algorithm, in tandem with ML-based predictive models to design polymers that meet practically useful, but extreme, property criteria (i.e., glass transition temperature, Tg>500 K and bandgap, Eg>6 eV). Analogous to nature, the characteristic properties of a polymer are assumed to be determined by the constituting types and sequence of chemical building blocks (or fragments) in the monomer unit. Evolution of polymers by natural operations of crossover, mutation, and selection over 100 generations leads to creation of 132 new (as compared to 4 already known cases) and chemically unique polymers with high Tg and Eg. Chemical guidelines on what fragments make up polymers with extreme thermal and electrical performance metrics have been selected and revealed by the algorithm. The approach presented here is general and can be extended to design polymers with different property objectives.

Graphical abstract

Polymers have found enormous use in numerous applications, due to their versatility and the richness of their chemical diversity. The latter aspect also poses a challenge. The near-infinite chemical space spanned by polymers leads to a daunting search problem. Edisonian trial-and-error and intuition-based strategies may not be efficient, and run the risk of missing good solutions. Moreover, if such strategies use traditional experimental or computational routes, they may be time- and resource-intensive.

Machine learning (ML) based surrogate models, trained on available polymer-property datasets, can make instantaneous property predictions for a new polymer, and may alleviate the burden on time and resources. But such accelerated prediction options still leave open the challenge of accumulating a large and diverse candidate set of polymers for which predictions need to be made. It is completely unclear how one would make such a candidate set “complete” enough to not miss suitable and important candidates.

A more general and appropriate approach would be to solve the “inverse problem”, i.e., given the desired property objectives, directly generate polymers that satisfy those objectives, as opposed to screening from a pre-defined candidate set. There have been attempts to perform such designs in the past, but they have been limited in terms of the explored chemical space, as they are constrained by the available choices of the building units. Recently, machine learning based generative models, such as variational autoencoders (VAE) and generative adversarial networks (GAN), have also been utilized to solve the inverse problem. They learn a mapping from a continuous latent space to the materials space, using which new materials with desired properties are generated after solving the optimization problem in the latent space. While this approach remains attractive for drug discovery, its application to periodic systems such as polymers is in a state of infancy.

In this contribution, they set their goal as the design of polymers with two extreme properties: high glass transition temperature (Tg) and high bandgap (Eg). The former is desirable to find polymers that have high thermal stability at high temperatures. The latter is useful for polymers that can withstand high electric fields, and display high dielectric strength. Collectively, these two properties are essential for several applications, including high-temperature high-energy density dielectrics. The difficulty in achieving these desired property objectives becomes apparent when they check the literature of known polymers: only four out of ~12,000 reference polymers collected from literature meet the target properties (Tg>500 K and Eg>6 eV) as illustrated. In this figure, the Tg and Eg estimates for the ~12,000 known polymers were made using their past ML models1.

Property map of glass transition temperature vs bandgap predicted by ML models.

Among 12,721 known polymers, only four polymers meet the desired property objectives (Tg>500 K and Eg>6 eV). The Tg and Eg values are based on ML predictions. The fitness function used for color code was defined as (normalized Tg) × (normalized Eg).

Process to design polymers using genetic algorithm framework.

(a) Overall workflow of iterative evolution of polymer generations. (b) Crossover and mutation to create offspring polymers from a pair of parent polymers. Polymers with four chemical building blocks (fragments) are shown for demonstration. (c) Offspring polymers mapped on to the property space of Tg vs bandgap Eg. 10 offspring polymers with highest fitness function are selected as parents in each iteration.

Evolution of the fitness function, chemical diversity, and and predictions of polymers across the 100 GA iterations.

(a) Fitness function evaluations for all the polymers generated in every GA iteration. From each generation, 10 offspring polymers with highest fitness function values are selected as parent polymers. ‘Good’ fragments from these parent polymers are transferred to the next generation, resulting in discovery of polymers with desired properties in the later generations. (b) 12,675 polymers projected on 2D PC space (PC generated using their polymer fingerprints). All polymers created during 100 generations are represented by gray points. Selected parents are color-coded by their generation number. Area of polymers created at the generation # 1, 10, 50, and 100 are selected to visualize the convergence in chemical diversity with evolution. Change in (c) Tg and (d) Eg predictions of polymers with every generation.

Virtual gene strip of polymers and example of new polymer designs.

(a) Gene strip shows cumulative occurrence of all fragments (chemical building blocks) over 100 generations of evolution. Nine fragments obtained from six hand-picked example polymer designs are indicated using their SMILES representation. (b) Position of 132 polymer designs generated during the 100 GA iterations on the map of Tg vs Eg. Uncertainty estimates for the predicted Tg (UTg) and predicted Eg (UEg) are shown together. Six hand-picked example polymers are highlighted with tags ‘G#-#’ representing the generation number G# and parent index #. (c) Gene strip and structure of the example polymers. A symbol ‘*’ marks an open position in polymer chain or chemical building block. Polymers that meet the design criteria consist of 2–6 building blocks, although the GA process was initiated with polymers containing 8 blocks. This is owing to the available flexibility in the segmentation position during crossover.

  1. Kim, C., Batra, R., Chen, L., Tran, H. & Ramprasad, R. (2021). Polymer design using genetic algorithm and machine learning. Computational materials science, 186, s. 110067. doi:10.1016/j.commatsci.2020.110067

Download • 5.04MB