Our main goal is to understand how genome regulation and genome evolution interplay with each other. Evolutionary changes in the genome can affect diverse cellular behaviors, including how genomic transactions (e.g. transcription, replication, DNA repair) are regulated. In turn, existing genomic regulatory systems can impact how and where genetic mutations occur, as well as evolutionary fates of mutations. Genome regulation and genome evolution have been and are being heavily investigated by thriving functional genomic and evolutionary genomic approaches, respectively. However, the two fundamental biological processes are usually investigated independently of each other and the intricate interplay between them remains poorly understood. Investigating this interplay is essential for understanding many biological phenomena which can not be adequately explained by looking at only one of the two aspects. By harnessing various omics data generated by high-throughput methods, we aim to perform integrative analysis to address related questions, with a strong emphasis on computational and quantitative biology.

Research lines

Currently, we develop three specific research lines (see below) which address distinct but inter-connected topics, focusing on vertebrate species. The first line is about how genome evolution affects genome regulation. The second examines how genomic regulatory systems affect genome evolution. Finally, the third is about how these molecular processes impact higher-level biological traits.

1. How do new genomic regulatory elements emerge and evolve?

Regulatory innovation reshapes the genomic regulatory program and plays a key role in biological diversity and evolution, but understanding of its evolutionary processes and functional implications remains incomplete. Alongside from our work, mounting evidence supports that repetitive sequences (comprising ~50% of the whole genome in mammals), which were previously considered ‘junk’, contribute significantly to regulatory innovation and warrant detailed investigation. In addition to transcriptional regulation, we also study the regulatory innovation and evolution in other processes such as replication and splicing, which are equally important but rather under-studied.

Selected work #1 We integrated multi-omic data to investigate the evolutionary trajectories of new transcription start sites (TSSs) in the human genome. We found that transposable elements (especially Long Terminal Repeats, LTRs) play major roles in the emergence of new TSSs, highlighting the importance of genomic repeats in regulatory innovation. See more in Li et al. Genome Res (2018).

Evolutionary trajectories of new TSSs

2. How do spontaneous germline mutations distribute across the genome and how do they occur?

Germline mutations are crucial for evolution and play important roles in many human diseases. Our understanding about the distribution and genesis of spontaneous mutations in many genomes remains limited, partly due to the difficulties in obtaining many de novo mutations. It is only in recent years that whole genome sequencing of many individuals to obtain a large number of de novo mutations became possible. Most advances in mechanistic understanding of DNA mutagenesis in past decades were derived from simple organisms or based on somatic mutations in cancer. Therefore numerous details of germline mutation distribution and mutational processes remain to be uncovered.

Selected work #2 We developed a deep learning framework named MuRaL for generating fine-scale germline mutation rate maps of genomes. MuRaL has better predictive performance at different scales than current state-of-the-art methods and can be applied to many sequenced species with population polymorphism data. See more in Fang et al. Nat Mach Intell (2022).

Schematic of the MuRaL framework

Selected work #3 By analyzing >300,000 de novo mutations and other omic datasets, we systematically assessed the effects of nucleosomes on de novo mutation rate variation across the human genome. We discovered that nucleosome positioning stability is a significant modulator of mutation rates and closely associated with evolution of SINE/LINE repeats. See more in Li & Luscombe Nat Commun (2020).

Interplay between nucleosome positioning stability, local mutation rate, and SINE/LINE elements

3. How do the processes of genome regulation and evolution affect biodiversity and human diseases?

Uncovering the connections between molecular processes and high-level phenotypes is a major goal in biology. We pay special attention to genomic signatures that are linked to lineage-specific traits or human diseases. This can also increase the impact of the findings obtained from addressing the above two research lines.

Selected work #4 By analyzing genomes from 48 avian species and other vertebrate outgroups, we identified millions of avian-specific highly conserved elements (ASHCEs), >99% of which reside in non-coding regions. Further functional genomic analysis and validation experiments of ASHCEs led to discovery of important regulatory elements associated with avian-specific traits such as wings and feathers. See more in Seki*, Li*, et al. Nat Commun (2017).

Functional investigation of avian-specific conserved regulatory elements in bird-specific traits

“Biologists must constantly keep in mind that what they see was not designed, but rather evolved. It might be thought, therefore, that evolutionary arguments would play a large part in guiding biological research, but this is far from the case. It is difficult enough to study what is happening now. To figure out exactly what happened in evolution is even more difficult. Thus evolutionary achievements can be used as hints to suggest possible lines of research, but it is highly dangerous to trust them too much. It is all too easy to make mistaken inferences unless the process involved is already very well understood.”

— Francis Crick, What Mad Pursuit: A Personal View of Scientific Discovery (1988)