A team led by Professor Markus A. Lill at the University of Basel has published a comprehensive study examining the limitations of current artificial intelligence models in drug design. While AI programs like AlphaFold 3 and RoseTTAFold All-Atom have shown impressive performance in predicting protein structures and their complexes with small molecules, Lill and his colleagues demonstrate that these models primarily rely on pattern recognition rather than understanding the underlying physical principles of protein-ligand interactions. Their work, highlights critical considerations for engineers, computational chemists, and pharmaceutical researchers who are increasingly integrating AI into drug discovery workflows.
Masters, M. R., Mahmoud, A. H., & Lill, M. A. (2025). Investigating whether deep learning models for co-folding learn the physics of protein-ligand interactions. Nature Communications, 16(1), 8854. https://doi.org/10.1038/s41467-025-63947-5
Proteins are central to biology and medicine. They can function as active pharmaceutical ingredients such as enzymes or antibodies, and they serve as target structures for therapeutic molecules. Understanding a protein’s three-dimensional structure is typically the first step in drug design, as the geometry and chemical environment of a binding site determine which molecules can interact effectively.
Traditionally, elucidating protein structures has been a labor-intensive and technically demanding process, often requiring X-ray crystallography or nuclear magnetic resonance spectroscopy. The introduction of machine learning and AI transformed this landscape. Models such as AlphaFold 2 and RoseTTAFold achieved remarkable accuracy in predicting protein folding from amino acid sequences, earning recognition from the scientific community and even contributing to the Nobel Prize in Chemistry in 2024.
Professor Markus A. Lill at the University of Basel stated,
“The better solution would be to integrate the physicochemical laws into future AI models”
The next step for these AI systems has been co-folding prediction, which aims to model proteins together with their ligands. This capability is particularly valuable for drug discovery, as it offers the potential to predict how small molecules might bind to target proteins without requiring extensive laboratory assays.
Despite apparent high success rates in benchmarks, Lill’s team observed a need to examine whether AI co-folding models truly capture the physical principles of binding. The concern was that models might perform well on known structures but fail to generalize to novel proteins or chemically altered ligands, precisely where new drug discovery efforts are most valuable.
To test this, the researchers systematically modified proteins and ligands. They altered amino acids in binding sites to change charge distributions, remove side chains, or introduce steric blocks. They also modified ligands to prevent docking or alter chemical properties. The AI models frequently predicted that the ligands still bound in their original positions, despite these deliberate disruptions. In over half of the cases, the predicted protein-ligand complexes remained virtually unchanged from the unmodified scenario.
This outcome demonstrates that the AI models do not inherently understand the physics of molecular interactions. Instead, they recognize patterns from the training dataset and apply these patterns even when the physical context has changed. Professor Lill summarizes, “Even the most advanced AI models do not really understand why a drug binds to a protein; they only recognize patterns that they have seen before.”
The study highlights that AI models struggle particularly with novel proteins that differ significantly from those in the training datasets. These are precisely the proteins that often represent the most interesting targets for innovative therapies. When encountering unknown or highly divergent proteins, the AI predictions frequently fail, underscoring the need for additional validation and more robust modeling approaches.
From an engineering perspective, this limitation has practical implications. Drug discovery pipelines that rely heavily on AI for early-stage screening may overestimate binding potential or overlook necessary adjustments in molecule design. Experimental validation and physics-based computational methods remain essential for confirming the viability of predicted interactions.
The researchers suggest that future AI systems could be improved by integrating explicit physical and chemical constraints. By combining pattern recognition with principles such as electrostatics, steric hindrance, and solvation effects, models could provide more accurate predictions for novel targets. Such integration would help ensure that AI predictions are not only computationally efficient but also scientifically reliable, enhancing the development of new therapeutic strategies.
This approach could also guide the design of new ligands for targets that have been difficult to study experimentally. By embedding physicochemical knowledge into AI architectures, computational predictions could become a more dependable tool for identifying drug candidates, reducing trial-and-error efforts in laboratories, and accelerating the overall discovery process.
The findings from Professor Lill’s team have several important consequences for the engineering of drug discovery systems:
First, they reinforce that current AI models, while useful, cannot replace experimental verification. Reliable pipelines will combine AI predictions with laboratory assays, molecular dynamics simulations, and free-energy calculations to validate binding.
Second, engineers and chemists designing AI-assisted workflows must account for the limitations of training data. The AI’s ability to generalize beyond known proteins is limited, and overreliance on predictions may introduce risk into the early stages of drug design.
Third, integrating physics into AI models represents a promising path forward. Models that explicitly consider the fundamental forces governing molecular interactions can provide predictions that are both accurate and generalizable. This integration could lead to more efficient discovery of novel therapeutic molecules and accelerate the pace at which experimental candidates progress toward clinical development.
Finally, the study emphasizes the need for robust validation protocols. Adversarial tests, like those performed in this study, could become a standard component of computational drug discovery, helping engineers assess the reliability of AI predictions before committing resources to experimental follow-up.
Professor Markus Lill and his team at the University of Basel have provided a crucial reality check for the use of AI in drug design. Their research demonstrates that despite impressive pattern recognition capabilities, current AI co-folding models do not inherently understand the physics of protein-ligand binding. For drug discovery engineers, the study highlights the importance of combining AI predictions with physical validation, experimental verification, and a deep understanding of molecular interactions.
By integrating physical principles into AI models and maintaining rigorous validation workflows, researchers can better leverage computational tools to explore novel proteins and design effective therapeutics. This work underscores that while AI has transformed protein structure prediction, the path to reliably designing new drugs still requires a balance of computation, engineering, and experimental science.

Adrian graduated with a Masters Degree (1st Class Honours) in Chemical Engineering from Chester University along with Harris. His master’s research aimed to develop a standardadised clean water oxygenation transfer procedure to test bubble diffusers that are currently used in the wastewater industry commercial market. He has also undergone placments in both US and China primarely focused within the R&D department and is an associate member of the Institute of Chemical Engineers (IChemE).
 
  
 
