The FDA has issued a long-awaited and highly anticipated guidance on establishing analytical testing of biosimilars to a reference product This article summarizes the essential elements of the guidance, and it identifies pivotal changes and recommended study designs.
In May 2019, the FDA issued a long-awaited and highly anticipated guidance on establishing analytical testing of biosimilars to a reference product after it withdrew its guidance, Statistical Approaches to Evaluate Analytical Similarity of biosimilars, in June 2018.
The new draft guidance applies to proposed biosimilars and to other protein products, such as in vivo protein diagnostic products, and is intended to recommend to sponsors the scientific and technical information needed for the chemistry, manufacturing, and controls (CMC) portion of a marketing application for a proposed product submitted under section 351(k) of the Public Health Service (PHS) Act.
This article summarizes the essential elements of the guidance, and it identifies pivotal changes and recommended study designs. Also, provided in this article is a listing of what the author believes to be deficiencies in the guidance that the FDA needs to address in order to enable faster development of biosimilars.
Section 351(k) of the PHS Act (42 USC 262[k]), added by the Biologics Price Competition and Innovation Act (BPCIA), sets forth the requirements for an application for a proposed biosimilar product or a proposed interchangeable product.
An application submitted under section 351(k) must contain, among other things, information demonstrating that the biological product is biosimilar to a reference product based upon data derived from:
The FDA has the discretion to determine that an element listed above is unnecessary in a 351(k) application, but not in the choice of the reference product, indications, dosage form, active drug form, and current good manufacturing practice compliance.
Analytical testing forms the core of the development of biosimilars, and the FDA has finalized guidance that described the statistical modeling of analytical similarity data—guidance that was universally followed by developers.
However, this guidance contained contradictions and inaccuracies pointed out by the author in the form of 2 citizen petitions, several publications, and testimonies to the FDA. The new draft guidance issued by the FDA replaces the earlier document, without making any reference to the earlier guidance, and includes several recommendations made by the author, as identified in this paper. Other recommendations that were not addressed by the FDA are also highlighted here, as are plausible options and solutions provided to help sponsors expedite the development of their biosimilar products.
In May 2019, the FDA also issued final guidance on the interchangeability status of biosimilars that included several recommendations made by the author. The FDA is expected to issue several additional guidance documents, as committed in its Biosimilars Action Plan, to make the development of biosimilars more rational without compromising their safety and efficacy.
It is worthwhile to note that the FDA now uses the term “analytical assessment,” which has a much broader scope than “analytical similarity” (as used in the withdrawn guidance), and emphasizes that even in those situations in which a biosimilar product does not fully match elements of biosimilarity, the sponsor can make a point to demonstrate why these differences are not critical to the safety and efficacy of the biosimilar product. This is a major shift in the thinking of the FDA; even though this option was available before, now the FDA clearly encourages sponsors to take a more scientific approach.
If differences between products are observed as part of the comparative analytical assessment (including the components of the assessment that were not included in the risk ranking), the sponsor may provide additional scientific information (a risk assessment and additional data) and a justification for why these differences do not preclude a demonstration that the products are highly similar.
In certain situations, changes to the manufacturing process of the biosimilar product may be needed to resolve differences observed in the comparative analytical assessment. Data should be provided to demonstrate that the observed differences were resolved by any manufacturing changes and that other quality attributes were not substantially affected. If other attributes were affected by the manufacturing change, data should be provided to demonstrate that the impact of the change has been evaluated and addressed.
Reference Product Attributes
The first step in an analytical assessment is a determination of the quality attributes that characterize the reference product in terms of its structural/physicochemical and functional properties.
These quality attributes are then ranked according to their risk to potentially impact activity, PK/PD, safety, efficacy, and immunogenicity. Finally, the attributes are evaluated using quantitative analysis, considering the risk ranking of the quality attributes, as well as other factors. It should be noted, however, that some attributes may be highly critical (eg, primary sequence) but not amenable to quantitative analysis.
The FDA recommends that sponsors develop a risk assessment tool to evaluate and rank the reference product’s quality attributes in terms of the potential impact on the mechanism or mechanisms of action and the function of the product. Certain quality evaluations of the reference product (eg, its degradation rates, which are determined from stability or forced degradation studies) generally should not be included in the risk ranking. However, these evaluations should still factor in the comparative analytical assessment of the proposed biosimilar and the reference product.
Development of the risk assessment tool should be informed by relevant factors, including:
The FDA recommends that an attribute that is high risk for any one of the performance categories (ie, activity, PK/PD, safety, efficacy, or immunogenicity) be classified as high risk. Ideally, the risk assessment tool should result in a list of attributes ordered by the risk to the patient. The risk scores for attributes should, therefore, be proportional to patient risk. The scoring criteria used in the risk assessment should be clearly defined and justified, and the risk ranking for each attribute should be justified with appropriate citations to the literature and data provided.
Protocols for Testing
The FDA identifies 4 stages at which the sponsor is expected to submit analytical data for review by the FDA:
If there is a manufacturing process change during development, it may be possible, with an adequate scientific justification, to use data generated from lots manufactured with a different process. However, data should be provided in the 351(k) Biologics License Application (BLA) to support comparability of drug substance and drug product manufactured with the different processes and/or scales.
A sponsor considering manufacturing changes after completing the initial comparative analytical assessment or after completing clinical studies intended to support a 351(k) application will need to demonstrate comparability between the pre- and post-change proposed product, and may need to conduct additional studies. The nature and extent of the changes may determine the extent of these additional studies. The comparative analytical studies should include a sufficient number of lots of the proposed biosimilar product used in clinical studies as well as from the proposed commercial process if the process used to produce the material used in the clinical studies is different.
However, the option of using the International Council on Harmonisation of Technical Requirements for Pharmaceuticals for Human Use’s (ICH’s) comparability protocol remains with the sponsor to make changes post-licensing. The main question is whether the FDA will allow smaller lots to be approved as clinical lots.
Minor modifications, such as N- or C- terminal truncations (eg, the heterogeneity of C-terminal lysine of a monoclonal antibody) that are not expected to change the product’s performance may be justified and should be explained by the sponsor. Possible differences between the chosen expression system (ie, host cell and the expression construct) of the proposed product and that of the reference product should be carefully considered because the type of expression system will affect the types of process- and product-related substances, impurities, and contaminants (including potential adventitious agents) that may be present in the protein product. For example, the expression system can have a significant effect on the types and extent of translational and posttranslational modifications that are imparted to the proposed product, which may introduce additional uncertainty into the demonstration that the proposed product is biosimilar to the reference product.
The FDA is gently reminding developers not to try out any novel expression systems and stick to the expression system used by the reference product.
If the manufacturing process used to produce the proposed product introduces different impurities or higher levels of impurities than those present in the reference product, additional pharmacological, toxicological, or other studies may be necessary. The FDA recommends removing impurity variations, rather than relying on proving the safety of impurities, in preclinical programs.
The process-related impurities in the proposed product are not expected to match those observed in the reference product and are not included in the comparative analytical assessment. The chosen analytical procedures should be adequate to detect, identify, and accurately quantify biologically significant levels of these impurities. In particular, results of immunological methods used to detect host cell proteins depend on the assay reagents and the cell substrate used.
Such assays should be validated using the product cell substrate and orthogonal methodologies to ensure accuracy and sensitivity. The safety of the proposed product with regard to adventitious agents or endogenous viral contamination should be ensured by screening critical raw materials and confirmation of robust virus removal and inactivation achieved by the manufacturing process.
The FDA now recognizes that, despite improvements in analytical techniques, current analytical methodology is not able to detect or characterize all relevant structural and functional differences between 2 protein products; this forms the basis of what the FDA now calls “assessment,” which may include testing of multiple attributes with several methods when comparing a proposed biosimilar to the reference product. Comprehensive physicochemical and functional studies may include the following:
A full characterization of the reference product, in addition to consideration of publicly available information, will form the basis of understanding the observed lot-to-lot variability derived from manufacturing conditions and from analytical assay variability. Factors that contribute to lot-to-lot variability in the manufacturing of a protein product include the source of certain raw materials (eg, growth medium, resins, or separation materials) and different manufacturing sites.
The FDA has further made it clear that, unlike routine quality control assays, tests used to characterize the product do not necessarily need to be validated; however, the tests used to characterize the product should be scientifically sound, fit for their intended use, and able to provide results that are reproducible and reliable.
The methods should be demonstrated to be of appropriate sensitivity and specificity to provide meaningful information as to whether the proposed product and the reference product are highly similar. The reason for not requiring validation comes from the testing protocols in which the 2 products are tested side-by-side, serving to control to each other. In the case of release testing, there is no such control, requiring the methods to be validated.
The FDA encourages the development of orthogonal quantitative methods to definitively identify any differences in product attributes. Based on the results of analytical studies assessing functional and physicochemical characteristics, including, for example, higher-order structure, post-translational modifications, and impurity and degradation profiles, the sponsor may have an appropriate scientific basis for a selective and targeted approach to subsequent animal and/or clinical studies to support a demonstration of biosimilarity.
It is advisable to apply more than 1 analytical procedure to evaluate the same quality attribute. Methods that use different physicochemical or biological principles to assess the same attribute are especially valuable because they provide independent data to support the quality of that attribute (eg, orthogonal methods to assess aggregation).
In addition, the use of complementary analytical techniques in a series, such as peptide mapping or capillary electrophoresis combined with mass spectrometry of the separated molecules, should provide a meaningful and sensitive method for comparing products.
It may be useful to compare differences in the quality attributes of the proposed product with those of the reference product using a meaningful, fingerprint-like analysis algorithm1 that covers a large number of additional product attributes and their combinations with high sensitivity using orthogonal methods. Enhanced approaches in manufacturing science, as discussed in ICH Q8(R2), may facilitate production processes that can better match a reference product’s fingerprint.
Such a strategy could further quantify the overall similarity between 2 molecules and may lead to additional bases for a more selective and targeted approach to subsequent animal and/or clinical studies.
Multiple functional assays should, in general, be performed as part of the comparative analytical assessments. If a reference product exhibits multiple functional activities, sponsors should perform a set of appropriate assays designed to evaluate the range of relevant activities for that product.
For example, with proteins that possess multiple functional domains expressing enzymatic and receptor-mediated activities, sponsors should evaluate both activities to the extent that these activities are relevant to the product’s performance. For products for which functional activity can be measured by more than 1 parameter (eg, enzyme kinetics or interactions with blood clotting factors), the comparative characterization of each parameter between products should be assessed.
In vitro bioactivity assays may not fully reflect the clinical activity of the protein. For example, these assays generally do not predict the bioavailability (PK and biodistribution) of the product, which can affect PD and clinical performance. Also, bioavailability can be dramatically altered by subtle differences in glycoform distribution or other posttranslational modifications.
Thus, these limitations should be taken into account when assessing the robustness of the quality of data supporting biosimilarity and the need for additional information that may address residual uncertainties. Finally, functional assays are important in assessing the occurrence of neutralizing antibodies in nonclinical and clinical studies.
When binding is part of the activity attributed to the protein product, analytical tests should be performed to characterize the proposed product in terms of its specific binding properties (eg, if binding to a receptor is inherent to protein function, this property should be measured and used in comparative studies). Various methods, such as surface plasmon resonance, microcalorimetry, or classical Scatchard analysis can provide information on the kinetics and thermodynamics of binding. Such information can be related to the functional activity and characterization of the proposed product's higher-order structure.
In the withdrawn guidance, the FDA had presented a complex 3-tier system, with tier 1 involving an equivalence interval, tier 2 involving equivalence range, and tier 3 involving physical matching of results.
The first 2 types of statistical modeling required selection of a minimum number of lots to satisfy the statistical criteria. The FDA continues to suggest that evaluation of multiple lots of a reference product and multiple lots of a proposed product are required to enable estimation of product variability across lots, and it continues in its recommendation to use at least 10 reference product lots (acquired over a time frame that spans expiration dates of several years), in the analytical assessment to ensure that the variability of the reference product is captured adequately.
The final number of lots should be sufficient to provide adequate information regarding the variability of the reference product. In cases in which limited numbers of reference product lots are available (eg, for certain orphan drugs), alternate, flexible comparative analytical assessments plans should be proposed and discussed with the FDA.
The FDA recommends that a sponsor include at least 6 to 10 lots of the proposed product in the comparative analytical assessment, and these should include lots manufactured with the investigational- and commercial-scale processes.
They may include validation lots as well as product lots manufactured at different scales, including engineering lots. These lots should be representative of the intended commercial manufacturing process. To the extent possible, proposed biosimilar lots included in the comparative analytical assessment should be derived from different drug substance batches to adequately represent the variability of attributes inherent to the drug substance manufacturing process.
Extracted Drug Substance
If the drug substance has been extracted from the reference product to conduct analytical studies, the sponsor should describe the extraction procedure and provide support to show that the procedure itself does not alter relevant product quality attributes.
This undertaking would include consideration of an alteration or loss of the desired products and impurities and relevant product-related substances, and it should include appropriate controls to ensure that relevant characteristics of the protein are not significantly altered by the extraction procedure.
Identification of specific lots of a reference product used in comparative analytical studies, together with expiration dates and time frames and when the lots were analyzed and used in other types of nonclinical and clinical studies, should be provided. This information will be useful in justifying acceptance criteria to ensure product consistency, as well as to support the comparative analytical assessment of the proposed product and the reference product.
For all methods in which the result is reported relative to the reference standard, the assignment of the potency of 100% should include a narrow acceptable potency range and should ensure control over product drift.
For example, a sponsor should consider the use of a predetermined 2-sided confidence interval (CI) of the mean of the replicates, where the mean relative potency and the 95% CI are included within a sufficiently narrow range (eg, 90%-110%). There should be an evaluation across the history of multiple reference standard qualifications to address potential drift. A sponsor generally should not use a correction factor to account for any differences in, for example, potency or biological activity between reference standards.
The Totality of Analytical Data
Acceptance criteria should be based on the totality of the analytical data and not simply on the observed range of product attributes of the reference product. This is because some product attributes act in combination to affect a product’s safety, purity, and potency profile; therefore, their potential interaction should be considered when conducting the comparative analytical assessment and setting specifications.
For example, for some glycoproteins, the content and distribution of tetra-antennary and N-acetyl lactosamine repeats can affect in vivo potency, and they should be evaluated together. The FDA emphasizes the confirming the relationship between an attribute and the performance of the drug product (see ICH Q8[R2]) to help establish acceptance criteria.
Sponsors should account for all reference product lots acquired and characterized. The 351(k) BLA should include data and information from all reference product and proposed product lots that were evaluated in any manner, including the specific physicochemical, functional, animal, and clinical studies for which a lot was used.
When a lot is specifically selected for inclusion or exclusion from certain analytical studies, a justification should be provided. The date of the analytical testing, as well as the product expiration date, should be provided in the application.
In general, expired reference product lots should not be included in the comparative analytical assessment because lots analyzed beyond their expiration date could lead to results outside the range that would normally be observed in unexpired lots, which may result in overestimated reference product variability. Testing of lots past expiry may be acceptable if samples are stored under long term conditions (eg, frozen at —80°C) provided that sponsors submit data and information demonstrating that storage does not impact the quality of the product.
The same type of information and data described above to be collected for reference product lots should also be provided on every manufactured drug substance and drug product lot of the proposed product.
Reference product and proposed product lots used in the clinical studies (eg, applicable PK and PD, similarity studies, and comparative clinical studies) should be included in the comparative analytical assessment.
If there is a suitable, publicly available, and well-established reference standard for the protein, a physicochemical and/or functional comparison of the proposed product with this standard may also provide useful information. However, while studies with such a reference standard may be useful, they are not sufficient to satisfy the BPCIA’s requirement to demonstrate the biosimilarity of the proposed product to the US-licensed reference product.
Once clinical lots of the proposed product have been manufactured, it is expected that 1 of these lots will be properly qualified (including bridging to previous reference standards) for use as a reference standard for release and stability, as well as comparative analytical testing. If possible, once an in-house reference standard is properly qualified, there should be sufficient quantities to use throughout the development of the proposed product. All lots of reference standards used during the development of a proposed product should be properly qualified. In addition to release testing methods, the qualification protocol for reference standards should include all analytical methods that report the result relative to the reference standard.
Non—US-Licensed Comparator Products
A sponsor intending to use a non—US-licensed comparator in certain studies should provide comparative analytical data and analysis for all pairwise comparisons (ie, US-licensed product versus proposed biosimilar product, non–US-licensed comparator product versus proposed biosimilar product, and US-licensed product versus non–US-licensed comparator product).
Combining data from the reference product and the non—US-licensed comparator product to determine acceptance criteria or to perform the comparative analytical assessment to the proposed product would not be acceptable to support a demonstration of the proposed product’s biosimilarity to the reference product.
For example, combining data from the reference product and non—US-licensed products may result in a larger range and broader similarity acceptance criteria than would be obtained by relying solely on data from reference product lots.
Drug Product Lots
Characterization studies of a proposed product should be performed on the most downstream intermediate best suited for the analytical procedures used. Whenever possible, if the finished drug product is best suited for a particular analysis, the sponsors should analyze the finished drug product. If an analytical method more sensitively detects specific attributes in the drug substance, but the attributes it measures are critical and/or may change during the manufacture of the finished drug product, comparative characterization may be called for on both the extracted protein and the finished drug product.
The BPCIA allows the use of different inactive ingredients; however, different excipients in the proposed product should be supported by existing toxicology data for the excipient or by additional toxicity studies with the formulation of the proposed product. Excipient interactions, as well as direct toxicities, should be considered.
The new guidance removes the tier 1 testing and keeps the tier 2 and tier 3 testing, though without labeling them as such. One approach to data analysis is the use of descriptive quality ranges for assessing quantitative quality attributes of high and moderate risk, and the use of raw data/graphical comparisons for quality attributes with the lowest risk ranking or for those quality attributes that cannot be quantitatively measured (eg, primary sequence).
The acceptance criteria for the quality ranges (QR) method in the comparative analytical assessment should be based on the results of the sponsor’s own analysis of the reference product for a specific quality attribute. The QR should be defined as a range calculated by adding to and subtracting from the reference mean the value of standard deviation of the reference product multiplied by a factor, X.
The multiplier (X) should be scientifically justified for that attribute and discussed with the FDA.
Based on experience to date, methods such as tolerance intervals are not recommended for establishing the similarity acceptance criteria because a very large number of lots would be required in order to establish meaningful intervals. The sponsor can propose other methods of data analysis, including equivalence testing.
The objective of the comparative analytical assessment is to verify that each attribute, as observed in the proposed biosimilar and the reference product, has a similar population mean and similar population standard deviation. Comparative analysis of a quality attribute would generally support a finding that the proposed product is highly similar to the reference product when a sufficient percentage of biosimilar lot values (eg, 90%) fall within the QR defined for that attribute (previously labeled as tier 2).
The FDA recommends that narrower acceptance criteria of the QR method in the comparative analytical assessment (eg, a lower X value) be applied to higher-risk quality attributes.
In addition to risk ranking, other factors should be considered in determining which type of quantitative data analysis should be applied to a particular attribute or assay. Some additional factors that should be considered when determining the appropriate type of data evaluation and analysis of results include nature, distribution, abundance of attribute, sensitivity and type of assay.
Qualitative analyses of lower-risk attributes will include a side-by-side data presentation (eg, spectra, thermograms, and graphical representation of data) to allow for a visual comparison of the proposed product to the reference product (previously labeled as tier 3).
The new analytical assessment guidance, intended to replace the withdrawn guidance, provides clarification of several practices, yet leaves out many specifics for the sponsors to interpret.
Given below are the key findings, and a listing of how the FDA’s guidance aligns with my citizen petitions, publications, and testimonies that included recommendations and inquiries to the FDA.
The FDA will be issuing several new guidance documents under the Biosimilars Action Plan with the purpose of simplifying the development of biosimilars; the new guidance on analytical assessment left many open-ended considerations, and sponsors are encouraged to provide their views to FDA as comments on this guidance.
1. Kozlowski S, J Woodcock, K Midthun, RB Sherman. Developing the Nation's Biosimilars Program. N Engl J Med. 2011;365:385-388. doi: 10.1056/NEJMp1107285.