Home

Amid the advances in genomics, the availability of large reference panels of human haplotypes is key to account for human diversity within and across populations. However, mass spectrometry-based proteomics does not benefit from this information.

Reference: Jakub Vasicek et al, Nature Method, 2024

To address this gap, we introduce ProHap, a Python-based tool that constructs protein sequence databases from phased genotypes of reference panels. ProHap enables researchers to account for haplotype diversity in proteomic searches. Reference: Jakub Vasicek et al, Nature Method, 2024

The emergence of Plasmodium falciparum parasites resistant to artemisinins compromises the efficacy of Artemisinin Combination Therapies (ACTs), the global first-line malaria treatment. Artemisinin resistance is a complex genetic trait in which nonsynonymous SNPs in PfK13 cooperate with other genetic variations.

Reference: Sourav Nayak et al, Nature Communications, 2024

Here, we present population genomic/transcriptomic analyses of P. falciparum collected from patients with uncomplicated malaria in Cambodia and Vietnam between 2018 and 2020. Besides the PfK13 SNPs, several polymorphisms, including nonsynonymous SNPs (N1131I and N821K) in PfRad5 and an intronic SNP in PfWD11 (WD40 repeat-containing protein on chromosome 11), appear to be associated with artemisinin resistance, possibly as new markers. There is also a defined set of genes whose steady-state levels of mRNA and/or splice variants or antisense transcripts correlate with artemisinin resistance at the base level. In vivo transcriptional responses to artemisinins indicate the resistant parasite’s capacity to decelerate its intraerythrocytic developmental cycle (IDC), which can contribute to the resistant phenotype.

During this response, PfRAD5 and PfWD11 upregulate their respective alternatively/aberrantly spliced isoforms, suggesting their contribution to the protective response to artemisinins. PfRAD5 and PfWD11 appear under selective pressure in the Greater Mekong Sub-region over the last decade, suggesting their role in the genetic background of the artemisinin resistance. Reference: Sourav Nayak et al, Nature Communications, 2024

Alternative splicing contributes to complex traits, but whether this differs in trait-relevant cell types across diverse genetic ancestries is unclear.

Reference: Chi Tian et al, Nature genetics, 2024

Here we describe cell-type-specific, sex-biased and ancestry-biased alternative splicing in ~1 M peripheral blood mononuclear cells from 474 healthy donors from the Asian Immune Diversity Atlas. We identify widespread sex-biased and ancestry-biased differential splicing, most of which is cell-type-specific. We identify 11,577 independent cis-splicing quantitative trait loci (sQTLs), 607 trans-sGenes and 107 dynamic sQTLs. Colocalization between cis-eQTLs and trans-sQTLs revealed a cell-type-specific regulatory relationship between HNRNPLL and PTPRC.

We observed an enrichment of cis-sQTL effects in autoimmune and inflammatory disease heritability. Specifically, we functionally validated an Asian-specific sQTL disrupting the 5′ splice site of TCHP exon 4 that putatively modulates the risk of Graves’ disease in East Asian populations. Our work highlights the impact of ancestral diversity on splicing and provides a roadmap to dissect its role in complex diseases at single-cell resolution. Reference: Chi Tian et al, Nature genetics, 2024


South Asians develop type 2 diabetes (T2D) early in life and often with normal body mass index (BMI). However, reasons for this are poorly understood because genetic research is largely focused on European ancestry groups.

Reference: Sam Hodgson et al, Nature Medicine, 2024

We used recently derived multi-ancestry partitioned polygenic scores (pPSs) to elucidate underlying etiological pathways British Pakistani and British Bangladeshi individuals with T2D (n = 11,678) and gestational diabetes mellitus (GDM) (n = 1,965) in the Genes & Health study (n = 50,556). Beta cell 2 (insulin deficiency) and Lipodystrophy 1 (unfavorable fat distribution) pPSs were most strongly associated with T2D, GDM and younger age at T2D diagnosis. Individuals at high genetic risk of both insulin deficiency and lipodystrophy were diagnosed with T2D 8.2 years earlier with BMI 3 kg m−2 lower compared to those at low genetic risk. The insulin deficiency pPS was associated with poorer HbA1c response to SGLT2 inhibitors.

Insulin deficiency and lipodystrophy pPSs were associated with faster progression to insulin dependence and microvascular complications. South Asians had a greater genetic burden from both of these pPSs than white Europeans in the UK Biobank. In conclusion, genetic predisposition to insulin deficiency and lipodystrophy in British Pakistani and British Bangladeshi individuals is associated with earlier onset of T2D, faster progression to complications, insulin dependence and poorer response to medication. Reference: Sam Hodgson et al, Nature Medicine, 2024


Variable number tandem repeat (VNTR) is a pervasive and highly mutable genetic feature that varies in both length and repeat sequence. Despite the well-studied copy-number variants, the functional impacts of repeat motif polymorphisms remain unknown.

Reference: Sijia Zhang et al Cell Genomics, 2024

Here, we present the largest genome-wide VNTR polymorphism map to date, with over 2.5 million VNTR length polymorphisms (VNTR-LPs) and over 11 million VNTR motif polymorphisms (VNTR-MPs) detected in 8,222 high-coverage genomes. Leveraging the large-scale NyuWa cohort, we identified 2,982,456 (31.8%) NyuWa-specific VNTR-MPs, of which 95.3% were rare. Moreover, we found 1,937 out of 38,685 VNTRs that were associated with gene expression through VNTR-MPs in lymphoblastoid cell lines.

Specifically, we clarified that the expansion of a likely causal motif could upregulate gene expression by improving the binding concentration of PU.1. We also explored the potential impacts of VNTR polymorphisms on phenotypic differentiation and disease susceptibility. This study expands our knowledge of VNTR-MPs and their functional implications. Reference: Sijia Zhang et al Cell Genomics, 2024


Next-generation T-cell-directed vaccines for COVID-19 focus on establishing lasting T-cell immunity against current and emerging SARS-CoV-2 variants. Precise identification of conserved T-cell epitopes is critical for designing effective vaccines.

Reference: Kevin A. Kovalchik et al , Nature communications, 2024

Here we introduce a comprehensive computational framework incorporating a machine learning algorithm-MHCvalidator-to enhance mass spectrometry-based immunopeptidomics sensitivity. MHCvalidator identifies unique T-cell epitopes presented by the B7 supertype, including an epitope from a + 1-frameshift in a truncated Spike antigen, supported by ribosome profiling. Analysis of 100,512 COVID-19 patient proteomes shows Spike antigen truncation in 0.85% of cases, revealing frameshifted viral antigens at the population level.

Our EpiTrack pipeline tracks global mutations of MHCvalidator-identified CD8 + T-cell epitopes from the BNT162b4 vaccine. While most vaccine epitopes remain globally conserved, an immunodominant A*01-associated epitope mutates in Delta and Omicron variants. This work highlights SARS-CoV-2 antigenic features and emphasizes the importance of continuous adaptation in T-cell vaccine development. Reference


An extensive resource for Bioinformatics, Epigenomics, Genomics and Metagenomics