In this blog we will address the difference between chemical informatics and materials informatics. Many are often asked this question, and we thought this post would help to provide an answer in a relatively concise yet comprehensive way.
- Chemical informatics (CI) (also cheminformatics or chemoinformatics) refers to the use of physical chemistry theory with computer and information science techniques – so-called "in silico" techniques – in application to a range of descriptive and prescriptive problems in the field of chemistry, including in its applications to biology and related molecular fields.
- Materials informatics (MI) is a field of study that applies the principles of informatics to materials science and engineering to better understand the use, selection, development, and discovery of materials.
These two definitions are very vague and do not capture the essence of these two areas. They provide a surface-level description of what CI and MI include but are far away from being comprehensive.
Both chemical informatics and materials informatics have two aspects:
- The use of artificial intelligence and machine learning to analyze data and make inferences from existing databases (open data platforms, literature), high-throughput experimentation (sometimes called screening), physics-based modeling and simulation (sometimes called simulation modeling).* The end goal is to improve efficiency (optimization of existing processes) and/or efficacy (creating new chemicals/materials and synthetic routes).
- Working on better data infrastructures and IT infrastructures for data storage, metadata labeling, traceability, security, and machine-learning friendliness.
According to our definition, examples of improving efficiency include optimization of design of experiments (DoE), better data infrastructure for data entry, tracking, and analysis (such as laboratory information management systems [LIMS]). Examples of improving efficacy include using generative adversarial networks (GANs) to design new molecules/chemical reactions (CI) (Figure 1), design of new alloy compositions using a phase diagram built by high-throughput experimentation (MI), or property prediction using existing materials properties (MI) (Figure 2).
Figure 1: Molecular structure generation using chemical informatics. Molecular structures were inputs (as SMILES). Source: ACS Cent. Sci. 2018, 4, 268−276.
Figure 2: Polymer property predictions using materials informatics; seven properties' numeric data were used. Source: J. Phys. Chem. C 2018, 122, 17575−17585.
CI and MI use very similar technologies. The fundamental difference is the entity to be analyzed: CI looks at molecular structures, whereas MI looks at materials' hierarchical structures (Figure 3 provides a visual guide to materials' structure hierarchy). To make it clearer, here is a hypothetical example: To predict the outcome of an organic reaction, data (molecular structures and their SMILES strings) of commonly used reactions are collected and input into a deep neural network (DNN). With a modest dataset, the DNN can predict reaction outcomes with good confidence. This is a CI case. Moving further, if the goal is to optimize a formulation that has a dozen polymers and additives, instead of the molecular structures, you will need the property data of these chemicals and their correlations. Now the data is physical properties, such as melting temperature, Young's modulus, and solubility in a given solvent. The result is a recipe of these chemicals, and this is an MI case because you are treating the polymers and additives as a whole, and each of them represents a data point on these physical properties; their molecular structure information is not treated as data.
Figure 3: Materials' structure hierarchy and some of the major research questions and techniques – from atomic to system. Source: Advanced Cooling Technologies.
why the confusion:
These two terms are used interchangeably in some cases. In fact, CI is much more widely used than MI, partially because of its application in the pharmaceutical space. On the other hand, although MI's origin dates back a couple of decades, its modern definition only began to get traction within the past decade, mainly thanks to developments in AI and computing, as well as government initiatives and academic research in MI (most MI startups came out of research groups). The reason behind MI's late emergence is materials' hierarchical structure. Take polymers, for example. In order to obtain enough data for inverse polymer design, you would need the structure-property information of polymers with similar structures, as well as the same information of common building blocks, and even functional groups and atomic and electron-scale simulations.
The complexity generated from such structural hierarchy often creates data scarcity for MI, thus slowing its progress. Nowadays, it is common to treat materials as a whole and leverage MI (the formulation optimization example where each material is treated as a whole entity and its hierarchal structural data is not considered). That said, we see a trend toward MI with hierarchal structure data in a few selected material types (metal alloys, for example) in the next five to 10 years.