Machine Learning Clarifies Stress-Based Degradation of Biosimilars

A study of trastuzumab biosimilars and the reference product (Herceptin) under control and stress conditions elucidated the value of machine learning.

Machine learning shows promise as a complementary approach to chromatographic (mixture separation) techniques for assessing biosimilarity and stability, according to a recent study.

Investigators evaluated machine learning vs chromatographic analysis in the study of 3 trastuzumab biosimilars and their reference product (Herceptin) under control and stress conditions. They concluded the machine learning results correlated with the chromatographic data and revealed patterns elucidating the effects of pH and thermal stress conditions.

Trastuzumab, a monoclonal antibody to human epidermal growth factor receptor 2 (HER2), is approved as a treatment for metastatic breast cancer, early breast cancer, and metastatic gastric cancer. The investigators found that the biosimilars showed high similarity under control conditions, but “differences in degradation patterns were detected under…forced degradation conditions” in the study.

First, physicochemical characteristics of the reference product and biosimilar trastuzumab products (approved for use in Egypt; and referred to as B1, B2, and B3 in the study) were determined by size exclusion chromatography, cation exchange chromatography, and peptide mapping. The biologics were evaluated under control conditions and under pH and thermal stress. The investigators then used unsupervised machine learning techniques to find patterns in the chromatographic data.

Chromatographic Analysis

The authors said primary structure and size and charge variants are quality attributes expected to affect the quality, safety, and efficacy of biologic drugs including trastuzumab. These attributes were similar in the biosimilars and reference product under control conditions, the authors found.

Thermal and pH stress, the authors noted, “are among the most studied stress conditions in forced degradation studies due to their direct effect on the size and charge variant profiles of [monoclonal antibodies] mAbs through deamidation and oxidation.” Under thermal and pH stress, the investigators did find differences in the degradation of the different products.

Size variants

Based on size exclusion chromatography, B2 and B3 showed a tendency to form high- and low-molecular weight variants under acidic and basic stress, and B2 showed 83% degradation by the 2-week time point under acidic stress. Under thermal stress, B3 showed the greatest degradation, 39% after 2 weeks.

Charge variants

Under acidic stress, the products varied from 19.9% degradation of the main variant of the reference product at 2 weeks to 93% for B2. Under basic stress, all samples showed a comparable increase in abundance of acidic variants. Under thermal stress, the charge variant distribution of B2 and B3 were similar to charge variant distribution for the reference product, while B1 showed a greater abundance of acidic variants.

Principal Component Analysis

The investigators used unsupervised machine learning techniques, which find patterns in data with no prior training or predefined subcategories. Principal component analysis (PCA) is a method for reducing complexity in high-dimensional data to a small number of components that explain the greatest percentage of the variance in the data set.

The authors plotted size exclusion chromatography and cation exchange chromatography data on 2-dimensional coordinates representing the 2 components (PC1 and PC2) that explained the most variance to identify patterns in the data. Primary component analysis of chromatographic and peptide mapping data of the control samples showed no outliers, which the authors said supports biosimilarity of the products.

The plot of control and acidic stressed samples showed that the control samples were separated along the primary component 1 (PC1) axis, while the stressed samples were distributed along the PC2 axis. Samples of the same product were clustered “relevantly close to each other,” the authors said, and their PCA results on control and acidic-stressed samples suggested 41% of the variance in the data was due to the applied stress, and 25% was due to inherent differences in the chromatographic profiles of the products.

Clustering Analysis

The investigators also used 2 clustering techniques, k-means and density-based spatial clustering of applications with noise (DBSCAN), on the data from the top 2 PCs from their primary component analysis. According to the authors, cluster analysis is “an unsupervised exploratory technique aiming to find natural grouping in data so that items in the same cluster are more similar to each other than to those from different clusters.”

Due to the “inherent variability” and “large number of possible structural variants” of monoclonal antibodies, the authors said, machine learning–aided approaches have “great value” for assessing their critical quality attributes. They cited previous research using PCA to reveal patterns in the data on biosimilarity and stability of other biologics, recombinant human growth hormone and infliximab.

K-means clustering of the unstressed samples segregated the products into 3 clusters, with the reference product and B2 each forming their own cluster, and B1 and B3 allocated to the same cluster. DBSCAN segregated each product to its own cluster.

K-means clustering was able to separate control and pH-stressed samples into different clusters, although B2 control samples were clustered with the stressed reference product and B3 samples. Cluster analysis suggested B3 was most similar to the reference product under acidic stress, while B2 was most similar under thermal stress, and all products had a similar response to basic pH stress. The greatest variability between control samples was between the reference product and B2.

Finally, application of principal component and clustering analyses to the collective data set from all the applied chromatographic techniques supported biosimilarity of the products, the authors said. This principal component analysis identified no samples that were significantly different from the others; k-means identified 3 clusters (reference product, B1 + B3, and B2), and DBSCAN identified 4 clusters, one containing each product.

The authors concluded their results supported the biosimilarity of the products analyzed, and “highlighted that regarding the charge and size profiles of the studied products, B2 showed higher variability (than B1 and B3) compared to HC under both control and stress conditions.” They said that the chromatographic fingerprints and machine learning results “were correlated and were able to reveal patterns related to the effect of different stress conditions on the different investigated products.” They recommended future studies explore other machine learning tools to interpret physicochemical data on biologic products.

For Further Reading

The European Medicines Authority reports on a pilot experiment in tailoring development of biosimilars, or eliminating unnecessary testing, and the World Health Organization develops guidelines to support the tailoring concept.


Shatat SM, Al-Ghobashy MA, Fathalla FA, Abbas SS, Eltanany BM. Coupling of trastuzumab chromatographic profiling with machine learning tools: a complementary approach for biosimilarity and stability assessment. J Chromatogr B Analyt Technol Biomed Life Sci. 2021;1184:122976. doi:10.1016/j.jchromb.2021.122976