Scientist
Dongxiao Zhang’s team at EIT developed EqGPT, an AI that integrates data and knowledge to autonomously discover and optimize new PDEs.

Sponsored by

Sponsored by

This research may challenge your previous perceptions.
It elevates algorithms from passive data analysts or imitators to active explorers and creators. Instead of merely digging out hidden patterns from data, it integrates existing knowledge to generate and optimize new theories and equations we have never conceived of, much like a true scientist.
A research team led by Academician Dongxiao Zhang from the Eastern Institute of Technology, Ningbo, has proposed an intelligent partial differential equation (PDE) discovery algorithm named EqGPT. This approach combines data-driven and knowledge-guided methodologies, enabling the autonomous generation and adaptive optimization of new equations.
On November 21st Beijing time, the related research findings were published in Nature Communications. Eastern Institute of Technology, Ningbo is the primary affiliation.
The study also contains an Easter egg rich in symbolism. The research team specially designed a unique computational case—carving the calculation domain into the shape of the characters EITech, making the equation visible within the outline of the campus.
Discovering hidden equations from the physical processes at the EITech boundary | Image provided by the research team
Partial differential equations are mathematical equations derived by scientists based on first principles to describe how continuously changing physical quantities (such as temperature, fluid motion, electromagnetic fields) vary across multiple dimensions. PDE discovery, conversely, work in the opposite direction. By analyzing data, they sift through a vast library of candidate PDE terms to identify the most concise and accurate terms that describe the data dynamics, thereby uncovering hidden physical laws within the data.

The data-driven and knowledge-guided coupled partial differential equation discovery framework (EqGPT) | Image provided by the research team
The research team integrated data-driven and knowledge-guided pathway through a generative model, learning from equation structures summarized in mathematics books to generate and optimize new equations.
To enable the AI model to learn and understand equations, the research team proposed two core technologies: Generative Representation of Equation (GRE) and Scientifically Augmented Training (SAT).
In the generative representation, the team decomposed equations into vocabularies, encoding operators and fundamental physical terms separately, along with start and end symbols, thereby uniformly converting PDEs into learnable sentences. Simultaneously, the team collected and organized 221 distinct PDE structures from mathematics books. Based on commutativity systems, these were expanded to 7072 equation sentences, forming a PDE dataset.
In Scientifically Augmented Training (SAT), the team employed a generative model to learn the co-occurrence relationships and combinatorial patterns of equation terms from the PDE dataset. This enables the model to generate new equations in a free-form yet syntactically valid manner during sampling. A direct benefit of this approach is the early filtering out of a large number of expressions that are syntactically correct but physically implausible, focusing the search more on candidates likely to be meaningful.
Finally, the team constructed a Generation-Evaluation-Optimization closed loop: the model generates a series of new equations, their quality is evaluated against observed data, and the top-performing equations are used to fine-tune the generative model. This process makes the model more inclined to generate explicit equations that are both physically grammatical and capable of explaining the data, leading to rapid convergence.

Discovering the governing equation for wave breaking from real experimental data | Image provided by the research team
The research team found that the proposed intelligent PDE discovery algorithm could accurately recover known classical equations from sparse, noisy data. Furthermore, the algorithm demonstrated high adaptability to irregular, complex boundary conditions.
Additionally, the algorithm can be extended to high-dimensional data. It successfully discovered the underlying coupled saturation-pressure governing equations from simulation data of a three-dimensional oil-water two-phase displacement process, proving its applicability in multivariate, multi-physics coupled systems.
Building on this, the research team also successfully utilized the intelligent PDE discovery algorithm to discover a new equation related to wave breaking behavior from real-world wave flume experimental data. This equation not only aligns with the actual observed water surface elevation at the critical moment of wave breaking but also offers new physical insights. For instance, a novel nonlinear term containing a third-order spatial derivative that appears in the equation can explain some high-energy behaviors during the wave breaking process.
Postdoctoral researcher Hao Xu from the Eastern Institute of Technology, Ningbo, is the first author of the paper. Dongxiao Zhang, Chair Professor and Yuntian Chen, Assistant Professor at the Eastern Institute of Technology, Ningbo, are co-corresponding authors. Co-authors include Assistant Professor Rui Cao from Ocean University of China, Professor Adrian H. Callaghan from Imperial College London, Assistant Professor Tianning Tang from The University of Manchester, Dr. Mengge Du from Peking University, and Postdoctoral researcher Jian Li from the Eastern Institute of Technology, Ningbo. This research was supported by the National Natural Science Foundation of China.