With the help of an AI, researchers have succeeded in designing synthetic DNA that controls the cells’ protein production. The technology can contribute to the development and production of vaccines, drugs for severe diseases, as well as alternative food proteins much faster and at significantly lower costs than today.
How our genes are expressed is a process that is fundamental to the functionality of cells in all living organisms. Simply put, the genetic code in DNA is transcribed to the molecule messenger RNA (mRNA), which tells the cell’s factory which protein to produce and in which quantities.
Researchers have put a lot of effort into trying to control gene expression because it can, among other things, contribute to the development of protein-based drugs. A recent example is the mRNA vaccine against COVID-19, which instructed the body’s cells to produce the same protein found on the surface of the coronavirus. The body’s immune system could then learn to form antibodies against the virus. Likewise, it is possible to teach the body’s immune system to defeat cancer cells or other complex diseases if one understands the genetic code behind the production of specific proteins.
DNA Controls the Quantity of a Specific Protein
Most of today’s new drugs are protein-based, but the techniques for producing them are both expensive and slow, because it is difficult to control how the DNA is expressed. Last year, a research group at Chalmers University of Technology, led by Aleksej Zelezniak, associate professor of systems biology, took an important step in understanding and controlling how much of a protein is made from a certain DNA sequence.
“First it was about being able to fully ‘read’ the DNA molecule’s instructions. Now we have succeeded in designing our own DNA that contains the exact instructions to control the quantity of a specific protein,” says Zelezniak about the research group’s latest important breakthrough.
DNA Molecules Made-to-Order
The principle behind the new method is similar to when an AI generates faces that look like real people. By learning what a large selection of faces looks like, the AI can then create completely new but natural-looking faces. It is then easy to modify a face by, for example, saying that it should look older, or have a different hairstyle.
On the other hand, programming a believable face from scratch, without the use of AI, would have been much more difficult and time-consuming. Similarly, the researchers’ AI has been taught the structure and regulatory code of DNA. The AI then designs synthetic DNA, where it is easy to modify its regulatory information in the desired direction of gene expression.
Simply put, the AI is told how much of a gene is desired and then ‘prints’ the appropriate DNA sequence.
“DNA is an incredibly long and complex molecule. It is thus experimentally extremely challenging to make changes to it by iteratively reading and changing it, then reading and changing it again. This way it takes years of research to find something that works. Instead, it is much more effective to let an AI learn the principles of navigating DNA. What otherwise takes years is now shortened to weeks or days,” says first author Jan Zrimec, a research associate at the National Institute of Biology in Slovenia and past postdoc in Zelezniak’s group.
Efficient Development of Proteins
The researchers have developed their method in the yeast Saccharomyces cerevisiae, whose cells resemble mammalian cells. The next step is to use human cells. The researchers have hopes that their progress will have an impact on the development of new as well as existing drugs.
“Protein-based drugs for complex diseases or alternative sustainable food proteins can take many years and can be extremely expensive to develop. Some are so expensive that it is impossible to obtain a return on investment, making them economically nonviable. With our technology, it is possible to develop and manufacture proteins much more efficiently so that they can be marketed,” says Zelezniak.
The authors of the study are Jan Zrimec, Xiaozhi Fu, Azam Sheikh Muhammad, Christos Skrekas, Vykintas Jauniskis, Nora K. Speicher, Christoph S. Börlin, Vilhelm Verendel, Morteza Haghir Chehreghani, Devdatt Dubhashi, Verena Siewers, Florian David, Jens Nielsen and Aleksej Zelezniak. The researchers are active at Chalmers University of Technology, National Institute of Biology, Slovenia; Biomatter Designs, Lithuania; Institute of Biotechnology, Lithuania; BioInnovation Institute, Denmark; King’s College London, UK.
This article was written by Karin Wik, Chalmers University of Technology. For more information, contact Aleksej Zelezniak, Associate Professor, Systems Biology, Life Sciences, at aleksej.
Read the study: Controlling gene expression with deep generative design of regulatory DNA.
Overview
The document discusses a novel approach to controlling gene expression through the design of synthetic regulatory DNA using deep generative models, specifically generative adversarial networks (GANs). The authors detail their methodology for creating and optimizing these models to generate regulatory sequences that can achieve desired mRNA expression levels.
The study focuses on various components of gene regulatory regions, including promoters, untranslated regions (UTRs), and terminators. The authors trained multiple generative models using specific lengths of these regions, such as a 400 bp promoter, 100 bp 5' UTR, 250 bp 3' UTR, and a 250 bp terminator. They also explored shorter segments of the promoter, including an 80 bp proximal promoter region and the core promoter region, to enhance the model's effectiveness.
To assess the functionality of the generated sequences, the researchers performed mutagenesis on the promoter region while keeping the UTR and terminator intact. They controlled the mutation levels to avoid pushing the sequences outside biologically relevant boundaries, creating a total of 16.8 million sequence variants. This extensive dataset allowed them to analyze the impact of different mutation rates (1%, 2%, 5%, and 10%) on predicted expression levels, leading to the selection of variants that achieved the desired expression changes.
The authors validated their synthetic sequences through in vivo measurements, demonstrating that these engineered regulatory elements could surpass the expression levels of highly-expressed natural controls. This finding highlights the potential of using deep learning and generative design in synthetic biology to create more efficient and effective gene expression systems.
The document also emphasizes the importance of computational resources and collaboration in this research, acknowledging support from various institutions and individuals. The findings contribute to the growing field of synthetic biology, offering new tools for manipulating gene expression in biotechnology and medicine.
Overall, this research represents a significant advancement in the design of synthetic regulatory DNA, showcasing the power of deep learning techniques in generating functional biological sequences that can be tailored for specific applications in gene expression control.


