Global genetic diversity of human gut microbiome species is related to geographic location and host health
The human gut harbors thousands of microbial species, each exhibiting significant inter-individual genetic variability. Although many studies have associated microbial relative abundances with human-health-related phenotypes, the substantial intraspecies genetic variability of gut microbes has not yet been comprehensively considered, limiting the potential of linking such genetic traits with host conditions.

Here, we analyzed 32,152 metagenomes from 94 microbiome studies across the globe to investigate the human microbiome intraspecies genetic diversity. We reconstructed 583 species-specific phylogenies and linked them to geographic information and species’ horizontal transmissibility.
We identified 484 microbial-strain-level associations with 241 host phenotypes, encompassing human anthropometric factors, biochemical measurements, diseases, and lifestyle. We observed a higher prevalence of a Ruminococcus gnavus clade in nonagenarians correlated with distinct plasma bile acid profiles and a melanoma and prostate-cancer-associated Collinsella clade. Our large-scale intraspecies genetic analysis highlights the relevance of strain diversity as it relates to human health. Reference: Sergio Andreu-Sánchez et al, Cell, 2025
Genomic, transcriptomic, and immunogenomic landscape of over 1300 sarcomas of diverse histology subtypes
Given their rarity and diversity, a fundamental understanding of the genomic underpinnings for many sarcoma subtypes is still lacking. To better define the molecular landscape of this group of diseases, we perform matched whole exome sequencing and RNA sequencing on a cohort of 1340 sarcoma tumor specimens. We identify recurrent somatic mutations and observe an increased mutational burden in metastatic vs. primary samples (p < 0.001).

We observe frequent copy number alterations including whole genome doubling, with this feature being more common in metastatic tumors (p = 0.026). Estimation of immune cell abundances followed by hierarchical clustering identifies five immune subtypes ranging from low to high and we observe inferior overall survival in immune deplete clusters compared to immune enriched (p < 0.01). Interestingly, GIST predominantly form a distinct “immune intermediate” cluster that is marked by a specific enrichment for NK cells (FDR < 0.01).
Reference: Alex Soupir et al, Nature communications, 2025
Ultrasensitive ctDNA detection for preoperative disease stratification in early-stage lung adenocarcinoma
Circulating tumor DNA (ctDNA) detection can predict clinical risk in early-stage tumors. However, clinical applications are constrained by the sensitivity of clinically validated ctDNA detection approaches.

NeXT Personal is a whole-genome-based, tumor-informed platform that has been analytically validated for ultrasensitive ctDNA detection at 1–3 ppm of ctDNA with 99.9% specificity. Through an analysis of 171 patients with early-stage lung cancer from the TRACERx study, we detected ctDNA pre-operatively within 81% of patients with lung adenocarcinoma (LUAD), including 53% of those with pathological TNM (pTNM) stage I disease.
ctDNA predicted worse clinical outcome, and patients with LUAD with <80 ppm preoperative ctDNA levels (the 95% limit of detection of a ctDNA detection approach previously published in TRACERx) experienced reduced overall survival compared with ctDNA-negative patients with LUAD. Although prospective studies are needed to confirm the clinical utility of the assay, these data show that our approach has the potential to improve disease stratification in early-stage LUADs.
Reference: James R. M. Black et al, Nature Medicine (2025)
The gut–brain axis underlying hepatic encephalopathy in liver cirrhosis
Up to 50–70% of patients with liver cirrhosis develop hepatic encephalopathy (HE), which is closely related to gut microbiota dysbiosis, with an unclear mechanism.
Here, by constructing gut–brain modules to assess bacterial neurotoxins from metagenomic datasets, we found that phenylalanine decarboxylase (PDC) genes, mainly from Ruminococcus gnavus, increased approximately tenfold in patients with cirrhosis and higher in patients with HE. Cirrhotic, not healthy, mice colonized with R. gnavus showed brain phenylethylamine (PEA) accumulation, along with memory impairment, symmetrical tremors and cortex-specific neuron loss, typically found in patients with HE.
This accumulation of PEA was primarily driven by decreased monoamine oxidase-B activity in both the liver and serum due to cirrhosis. Targeting PDC or PEA reversed the neurological symptoms induced by R. gnavus. Furthermore, fecal microbiota transplantation from patients with HE to germ-free cirrhotic mice replicated these symptoms and further corroborated the efficacy of targeting PDC or PEA. Clinically, high baseline PEA levels were linked to a sevenfold increased risk of HE after intrahepatic portosystemic shunt procedures. Our findings expand the understanding of the gut–liver–brain axis and identify a promising therapeutic and predictive target for HE.
Reference: Xiaolong He et al, Nature Medicine, 2025
vmrseq: probabilistic modeling of single-cell methylation heterogeneity
Single-cell DNA methylation measurements reveal genome-scale inter-cellular epigenetic heterogeneity, but extreme sparsity and noise challenges rigorous analysis.

Previous methods to detect variably methylated regions (VMRs) have relied on predefined regions or sliding windows and report regions insensitive to heterogeneity level present in input. We present vmrseq, a statistical method that overcomes these challenges to detect VMRs with increased accuracy in synthetic benchmarks and improved feature selection in case studies.
vmrseq also highlights context-dependent correlations between methylation and gene expression, supporting previous findings and facilitating novel hypotheses on epigenetic regulation. vmrseq is available at https://github.com/nshen7/vmrseq.
Transcriptional profile of the rat cardiovascular system at single-cell resolution
We sought to characterize cellular composition across the cardiovascular system of the healthy Wistar rat, an important model in preclinical cardiovascular research.

We performed single-nucleus RNA sequencing (snRNA-seq) in 78 samples in 10 distinct regions, including the four chambers of the heart, ventricular septum, sinoatrial node, atrioventricular node, aorta, pulmonary artery, and pulmonary veins, which produced 505,835 nuclei. We identified 26 distinct cell types and additional subtypes, with different cellular composition across cardiac regions and tissue-specific transcription for each cell type. Several cell subtypes were region specific, including a subtype of vascular smooth muscle cells enriched in the large vasculature.
We observed tissue-enriched cellular communication networks, including heightened Nppa-Npr1/2/3 signaling in the sinoatrial node. The existence of tissue-restricted cell types suggests regional regulation of cardiovascular physiology. Our detailed transcriptional characterization of each cell type offers the potential to identify novel therapeutic targets and improve preclinical models of cardiovascular disease. Reference: Alessandro Arduini et al, Cell Reports, 2024
SiRCle (Signature Regulatory Clustering) model integration reveals mechanisms of phenotype regulation in renal cancer
Clear cell renal cell carcinoma (ccRCC) tumours develop and progress via complex remodelling of the kidney epigenome, transcriptome, proteome and metabolome. Given the subsequent tumour and inter-patient heterogeneity, drug-based treatments report limited success, calling for multi-omics studies to extract regulatory relationships, and ultimately, to develop targeted therapies.

Here, we present SiRCle (Signature Regulatory Clustering), a method to integrate DNA methylation, RNA-seq and proteomics data at the gene level by following central dogma of biology, i.e. genetic information proceeds from DNA, to RNA, to protein.
To identify regulatory clusters across the different omics layers, we group genes based on the layer where the gene’s dysregulation first occurred. We combine the SiRCle clusters with a variational autoencoder (VAE) to reveal key features from omics’ data for each SiRCle cluster and compare patient subpopulations in a ccRCC and a PanCan cohort. Reference: Ariane Mora et al, Genomie Medicine, 2024
ProHap enables human proteomic database generation accounting for population diversity
Amid the advances in genomics, the availability of large reference panels of human haplotypes is key to account for human diversity within and across populations. However, mass spectrometry-based proteomics does not benefit from this information.
To address this gap, we introduce ProHap, a Python-based tool that constructs protein sequence databases from phased genotypes of reference panels. ProHap enables researchers to account for haplotype diversity in proteomic searches. Reference: Jakub Vasicek et al, Nature Method, 2024
Population genomics and transcriptomics of Plasmodium falciparum in Cambodia and Vietnam
The emergence of Plasmodium falciparum parasites resistant to artemisinins compromises the efficacy of Artemisinin Combination Therapies (ACTs), the global first-line malaria treatment. Artemisinin resistance is a complex genetic trait in which nonsynonymous SNPs in PfK13 cooperate with other genetic variations.

Here, we present population genomic/transcriptomic analyses of P. falciparum collected from patients with uncomplicated malaria in Cambodia and Vietnam between 2018 and 2020. Besides the PfK13 SNPs, several polymorphisms, including nonsynonymous SNPs (N1131I and N821K) in PfRad5 and an intronic SNP in PfWD11 (WD40 repeat-containing protein on chromosome 11), appear to be associated with artemisinin resistance, possibly as new markers. There is also a defined set of genes whose steady-state levels of mRNA and/or splice variants or antisense transcripts correlate with artemisinin resistance at the base level. In vivo transcriptional responses to artemisinins indicate the resistant parasite’s capacity to decelerate its intraerythrocytic developmental cycle (IDC), which can contribute to the resistant phenotype.
During this response, PfRAD5 and PfWD11 upregulate their respective alternatively/aberrantly spliced isoforms, suggesting their contribution to the protective response to artemisinins. PfRAD5 and PfWD11 appear under selective pressure in the Greater Mekong Sub-region over the last decade, suggesting their role in the genetic background of the artemisinin resistance. Reference: Sourav Nayak et al, Nature Communications, 2024
Single-cell RNA sequencing of peripheral blood links to autoimmune and inflammatory diseases
Alternative splicing contributes to complex traits, but whether this differs in trait-relevant cell types across diverse genetic ancestries is unclear.

Here we describe cell-type-specific, sex-biased and ancestry-biased alternative splicing in ~1 M peripheral blood mononuclear cells from 474 healthy donors from the Asian Immune Diversity Atlas. We identify widespread sex-biased and ancestry-biased differential splicing, most of which is cell-type-specific. We identify 11,577 independent cis-splicing quantitative trait loci (sQTLs), 607 trans-sGenes and 107 dynamic sQTLs. Colocalization between cis-eQTLs and trans-sQTLs revealed a cell-type-specific regulatory relationship between HNRNPLL and PTPRC.
We observed an enrichment of cis-sQTL effects in autoimmune and inflammatory disease heritability. Specifically, we functionally validated an Asian-specific sQTL disrupting the 5′ splice site of TCHP exon 4 that putatively modulates the risk of Graves’ disease in East Asian populations. Our work highlights the impact of ancestral diversity on splicing and provides a roadmap to dissect its role in complex diseases at single-cell resolution. Reference: Chi Tian et al, Nature genetics, 2024
Genetic basis of early onset and progression of type 2 diabetes in South Asians
South Asians develop type 2 diabetes (T2D) early in life and often with normal body mass index (BMI). However, reasons for this are poorly understood because genetic research is largely focused on European ancestry groups.

Reference: Sam Hodgson et al, Nature Medicine, 2024
We used recently derived multi-ancestry partitioned polygenic scores (pPSs) to elucidate underlying etiological pathways British Pakistani and British Bangladeshi individuals with T2D (n = 11,678) and gestational diabetes mellitus (GDM) (n = 1,965) in the Genes & Health study (n = 50,556). Beta cell 2 (insulin deficiency) and Lipodystrophy 1 (unfavorable fat distribution) pPSs were most strongly associated with T2D, GDM and younger age at T2D diagnosis. Individuals at high genetic risk of both insulin deficiency and lipodystrophy were diagnosed with T2D 8.2 years earlier with BMI 3 kg m−2 lower compared to those at low genetic risk. The insulin deficiency pPS was associated with poorer HbA1c response to SGLT2 inhibitors.
Insulin deficiency and lipodystrophy pPSs were associated with faster progression to insulin dependence and microvascular complications. South Asians had a greater genetic burden from both of these pPSs than white Europeans in the UK Biobank. In conclusion, genetic predisposition to insulin deficiency and lipodystrophy in British Pakistani and British Bangladeshi individuals is associated with earlier onset of T2D, faster progression to complications, insulin dependence and poorer response to medication. Reference: Sam Hodgson et al, Nature Medicine, 2024
Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes
Variable number tandem repeat (VNTR) is a pervasive and highly mutable genetic feature that varies in both length and repeat sequence. Despite the well-studied copy-number variants, the functional impacts of repeat motif polymorphisms remain unknown.

Reference: Sijia Zhang et al Cell Genomics, 2024
Here, we present the largest genome-wide VNTR polymorphism map to date, with over 2.5 million VNTR length polymorphisms (VNTR-LPs) and over 11 million VNTR motif polymorphisms (VNTR-MPs) detected in 8,222 high-coverage genomes. Leveraging the large-scale NyuWa cohort, we identified 2,982,456 (31.8%) NyuWa-specific VNTR-MPs, of which 95.3% were rare. Moreover, we found 1,937 out of 38,685 VNTRs that were associated with gene expression through VNTR-MPs in lymphoblastoid cell lines.
Specifically, we clarified that the expansion of a likely causal motif could upregulate gene expression by improving the binding concentration of PU.1. We also explored the potential impacts of VNTR polymorphisms on phenotypic differentiation and disease susceptibility. This study expands our knowledge of VNTR-MPs and their functional implications. Reference: Sijia Zhang et al Cell Genomics, 2024
Machine learning-enhanced immunopeptidomics applied to T-cell epitope discovery for COVID-19 vaccines
Next-generation T-cell-directed vaccines for COVID-19 focus on establishing lasting T-cell immunity against current and emerging SARS-CoV-2 variants. Precise identification of conserved T-cell epitopes is critical for designing effective vaccines.

Here we introduce a comprehensive computational framework incorporating a machine learning algorithm-MHCvalidator-to enhance mass spectrometry-based immunopeptidomics sensitivity. MHCvalidator identifies unique T-cell epitopes presented by the B7 supertype, including an epitope from a + 1-frameshift in a truncated Spike antigen, supported by ribosome profiling. Analysis of 100,512 COVID-19 patient proteomes shows Spike antigen truncation in 0.85% of cases, revealing frameshifted viral antigens at the population level.
Our EpiTrack pipeline tracks global mutations of MHCvalidator-identified CD8 + T-cell epitopes from the BNT162b4 vaccine. While most vaccine epitopes remain globally conserved, an immunodominant A*01-associated epitope mutates in Delta and Omicron variants. This work highlights SARS-CoV-2 antigenic features and emphasizes the importance of continuous adaptation in T-cell vaccine development. Reference