Update on: March 01, 2024
Epigenetic variation impacts individual differences in the transcriptional response to influenza infection
Humans display remarkable interindividual variation in their immune response to identical challenges. Yet, our understanding of the genetic and epigenetic factors contributing to such variation remains limited.
Here we performed in-depth genetic, epigenetic and transcriptional profiling on primary macrophages derived from individuals of European and African ancestry before and after infection with influenza A virus.
We show that baseline epigenetic profiles are strongly predictive of the transcriptional response to influenza A virus across individuals. Quantitative trait locus (QTL) mapping revealed highly coordinated genetic effects on gene regulation, with many cis-acting genetic variants impacting concomitantly gene expression and multiple epigenetic marks. These data reveal that ancestry-associated differences in the epigenetic landscape can be genetically controlled, even more than gene expression. Reference
scGIST: gene panel design for spatial transcriptomics with prioritized gene sets
A critical challenge of single-cell spatial transcriptomics (sc-ST) technologies is their panel size.
Being based on fluorescence in situ hybridization, they are typically limited to panels of about a thousand genes. This constrains researchers to build panels from only the marker genes of different cell types and forgo other genes of interest, e.g., genes encoding ligand-receptor complexes or those in specific pathways.
We propose scGIST, a constrained feature selection tool that designs sc-ST panels prioritizing user-specified genes without compromising cell type detection accuracy. We demonstrate scGIST’s efficacy in diverse use cases, highlighting it as a valuable addition to sc-ST’s algorithmic toolbox. Reference
Metabolomic machine learning predictor for diagnosis and prognosis of gastric cancer
Gastric cancer (GC) represents a significant burden of cancer-related mortality worldwide, underscoring an urgent need for the development of early detection strategies and precise postoperative interventions.
However, the identification of non-invasive biomarkers for early diagnosis and patient risk stratification remains underexplored. Here, we conduct a targeted metabolomics analysis of 702 plasma samples from multi-center participants to elucidate the GC metabolic reprogramming. Our machine learning analysis reveals a 10-metabolite GC diagnostic model, which is validated in an external test set with a sensitivity of 0.905, outperforming conventional methods leveraging cancer protein markers (sensitivity < 0.40).
Additionally, our machine learning-derived prognostic model demonstrates superior performance to traditional models utilizing clinical parameters and effectively stratifies patients into different risk groups to guide precision interventions. Collectively, our findings reveal the metabolic landscape of GC and identify two distinct biomarker panels that enable early detection and prognosis prediction respectively, thus facilitating precision medicine in GC. Reference
Functional dissection of human cardiac enhancers and noncoding de novo variants in congenital heart disease
Rare coding mutations cause ∼45% of congenital heart disease (CHD). Noncoding mutations that perturb cis-regulatory elements (CREs) likely contribute to the remaining cases, but their identification has been problematic.
Using a lentiviral massively parallel reporter assay (lentiMPRA) in human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs), we functionally evaluated 6,590 noncoding de novo variants (ncDNVs) prioritized from the whole-genome sequencing of 750 CHD trios. A total of 403 ncDNVs substantially affected cardiac CRE activity. A majority increased enhancer activity, often at regions with undetectable reference sequence activity. Of ten DNVs tested by introduction into their native genomic context, four altered the expression of neighboring genes and iPSC-CM transcriptional state.
To prioritize future DNVs for functional testing, we used the MPRA data to develop a regression model, EpiCard. Analysis of an independent CHD cohort by EpiCard found enrichment of DNVs. Together, we developed a scalable system to measure the effect of ncDNVs on CRE activity and deployed it to systematically assess the contribution of ncDNVs to CHD. Reference
Simple but powerful interactive data analysis in R with R/LinekdCharts
In research involving data-rich assays, exploratory data analysis is a crucial step.
Typically, this involves jumping back and forth between visualizations that provide overview of the whole data and others that dive into details. For example, it might be helpful to have one chart showing a summary statistic for all samples, while a second chart provides details for points selected in the first chart.
We present R/LinkedCharts, a framework that renders this task radically simple, requiring very few lines of code to obtain complex and general visualization, which later can be polished to provide interactive data access of publication quality. Reference
Single-cell multiomics decodes regulatory programs for mouse secondary palate development
Perturbations in gene regulation during palatogenesis can lead to cleft palate, which is among the most common congenital birth defects.
Here, we perform single-cell multiome sequencing and profile chromatin accessibility and gene expression simultaneously within the same cells (n = 36,154) isolated from mouse secondary palate across embryonic days (E) 12.5, E13.5, E14.0, and E14.5. We construct five trajectories representing continuous differentiation of cranial neural crest-derived multipotent cells into distinct lineages. By linking open chromatin signals to gene expression changes, we characterize the underlying lineage-determining transcription factors.
In silico perturbation analysis identifies transcription factors SHOX2 and MEOX2 as important regulators of the development of the anterior and posterior palate, respectively. In conclusion, our study charts epigenetic and transcriptional dynamics in palatogenesis, serving as a valuable resource for further cleft palate research. Reference
Benchmarking splice variant prediction algorithms using massively parallel splicing assays
Variants that disrupt mRNA splicing account for a sizable fraction of the pathogenic burden in many genetic disorders, but identifying splice-disruptive variants (SDVs) beyond the essential splice site dinucleotides remains difficult.
Computational predictors are often discordant, compounding the challenge of variant interpretation. Because they are primarily validated using clinical variant sets heavily biased to known canonical splice site mutations, it remains unclear how well their performance generalizes.
We benchmark eight widely used splicing effect prediction algorithms, leveraging massively parallel splicing assays (MPSAs) as a source of experimentally determined ground-truth. MPSAs simultaneously assay many variants to nominate candidate SDVs. We compare experimentally measured splicing outcomes with bioinformatic predictions for 3,616 variants in five genes. Algorithms’ concordance with MPSA measurements, and with each other, is lower for exonic than intronic variants, underscoring the difficulty of identifying missense or synonymous SDVs. Reference
Age-dependent topic modeling of comorbidities in UK Biobank identifies disease subtypes with differential genetic risk
The analysis of longitudinal data from electronic health records (EHRs) has the potential to improve clinical diagnoses and enable personalized medicine, motivating efforts to identify disease subtypes from patient comorbidity information.
Here we introduce an age-dependent topic modeling (ATM) method that provides a low-rank representation of longitudinal records of hundreds of distinct diseases in large EHR datasets. We applied ATM to 282,957 UK Biobank samples, identifying 52 diseases with heterogeneous comorbidity profiles; analyses of 211,908 All of Us samples produced concordant results.
We defined subtypes of the 52 heterogeneous diseases based on their comorbidity profiles and compared genetic risk across disease subtypes using polygenic risk scores (PRSs), identifying 18 disease subtypes whose PRS differed significantly from other subtypes of the same disease. We further identified specific genetic variants with subtype-dependent effects on disease risk. In conclusion, ATM identifies disease subtypes with differential genome-wide and locus-specific genetic risk profiles. Reference
Rare variant associations with plasma protein levels in the UK Biobank
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown.
Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype–protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort.
We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene–protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. Reference
Molecular classification of hormone receptor-positive HER2-negative breast cancer
Hormone receptor-positive (HR+)/human epidermal growth factor receptor 2-negative (HER2−) breast cancer is the most prevalent type of breast cancer, in which endocrine therapy resistance and distant relapse remain unmet challenges.
Accurate molecular classification is urgently required for guiding precision treatment. We established a large-scale multi-omics cohort of 579 patients with HR+/HER2− breast cancer and identified the following four molecular subtypes: canonical luminal, immunogenic, proliferative and receptor tyrosine kinase (RTK)-driven. Tumors of these four subtypes showed distinct biological and clinical features, suggesting subtype-specific therapeutic strategies. The RTK-driven subtype was characterized by the activation of the RTK pathways and associated with poor outcomes.
The immunogenic subtype had enriched immune cells and could benefit from immune checkpoint therapy. In addition, we developed convolutional neural network models to discriminate these subtypes based on digital pathology for potential clinical translation. The molecular classification provides insights into molecular heterogeneity and highlights the potential for precision treatment of HR+/HER2− breast cancer. Reference
Epigenomic dissection of Alzheimer’s disease pinpoints causal variants and reveals epigenome erosion
Recent work has identified dozens of non-coding loci for Alzheimer’s disease (AD) risk, but their mechanisms and AD transcriptional regulatory circuitry are poorly understood.
Here, we profile epigenomic and transcriptomic landscapes of 850,000 nuclei from prefrontal cortexes of 92 individuals with and without AD to build a map of the brain regulome, including epigenomic profiles, transcriptional regulators, co-accessibility modules, and peak-to-gene links in a cell-type-specific manner. We develop methods for multimodal integration and detecting regulatory modules using peak-to-gene linking.
We show AD risk loci are enriched in microglial enhancers and for specific TFs including SPI1, ELF2, and RUNX1. We detect 9,628 cell-type-specific ATAC-QTL loci, which we integrate alongside peak-to-gene links to prioritize AD variant regulatory circuits. We report differential accessibility of regulatory modules in late AD in glia and in early AD in neurons. Strikingly, late-stage AD brains show global epigenome dysregulation indicative of epigenome erosion and cell identity loss. Reference
Genome-wide enhancer-gene regulatory maps
Genome-wide association studies have identified numerous variants associated with human complex traits, most of which reside in the non-coding regions, but biological mechanisms remain unclear.
However, assigning function to the non-coding elements is still challenging. Here we apply Activity-by-Contact (ABC) model to evaluate enhancer-gene regulation effect by integrating multi-omics data and identified 544,849 connections across 20 cancer types. ABC model outperforms previous approaches in linking regulatory variants to target genes. Furthermore, we identify over 30,000 enhancer-gene connections in colorectal cancer (CRC) tissues.
By integrating large-scale population cohorts (23,813 cases and 29,973 controls) and multipronged functional assays, we demonstrate an ABC regulatory variant rs4810856 associated with CRC risk (Odds Ratio = 1.11, 95%CI = 1.05–1.16, P = 4.02 × 10−5) by acting as an allele-specific enhancer to distally facilitate PREX1, CSE1L and STAU1 expression, which synergistically activate p-AKT signaling. Our study provides comprehensive regulation maps and illuminates a single variant regulating multiple genes, providing insights into cancer etiology. Reference
Accurate proteome-wide missense variant effect prediction with AlphaMissense
Genome sequencing has revealed extensive genetic variation in human populations. Missense variants are genetic variants that alter the amino acid sequence of proteins. Pathogenic missense variants disrupt protein function and reduce organismal fitness, while benign missense variants have limited effect.
Classifying these variants is an important ongoing challenge in human genetics. Of more than 4 million observed missense variants, only an estimated 2% have been clinically classified as pathogenic or benign, while the vast majority of them are of unknown clinical significance. This limits the diagnosis of rare diseases, as well as the development or application of clinical treatments that target the underlying genetic cause. Machine learning approaches could close the variant interpretation gap by exploiting patterns in biological data to predict the pathogenicity of unannotated variants.
We developed AlphaMissense to leverage advances on multiple fronts: (i) unsupervised protein language modeling to learn amino acid distributions conditioned on sequence context; (ii) incorporating structural context by using an AlphaFold-derived system; and (iii) fine-tuning on weak labels from population frequency data, thereby avoiding bias from human-curated annotations. AlphaMissense achieves state-of-the-art missense pathogenicity predictions in clinical annotation, de novo disease variants, and experimental assay benchmarks without explicitly training on such data. Reference
A robust deep learning workflow to predict CD8 + T-cell epitopes
T-cells play a crucial role in the adaptive immune system by triggering responses against cancer cells and pathogens, while maintaining tolerance against self-antigens, which has sparked interest in the development of various T-cell-focused immunotherapies.
However, the identification of antigens recognised by T-cells is low-throughput and laborious. To overcome some of these limitations, computational methods for predicting CD8 + T-cell epitopes have emerged. Despite recent developments, most immunogenicity algorithms struggle to learn features of peptide immunogenicity from small datasets, suffer from HLA bias and are unable to reliably predict pathology-specific CD8 + T-cell epitopes.
TRAP was used to identify epitopes from glioblastoma patients as well as SARS-CoV-2 peptides, and it outperformed other algorithms in both cancer and pathogenic settings. TRAP was especially effective at extracting immunogenicity-associated properties from restricted data of emerging pathogens and translating them onto related species, as well as minimising the loss of likely epitopes in imbalanced datasets. Reference
An immune cell atlas reveals the dynamics of human macrophage specification during prenatal development
Macrophages are heterogeneous and play critical roles in development and disease, but their diversity, function, and specification remain inadequately understood during human development.
We generated a single-cell RNA sequencing map of the dynamics of human macrophage specification from PCW 4–26 across 19 tissues. We identified a microglia-like population and a proangiogenic population in 15 macrophage subtypes. Microglia-like cells, molecularly and morphologically similar to microglia in the CNS, are present in the fetal epidermis, testicle, and heart. They are the major immune population in the early epidermis, exhibit a polarized distribution along the dorsal-lateral-ventral axis, and interact with neural crest cells, modulating their differentiation along the melanocyte lineage.
Through spatial and differentiation trajectory analysis, we also showed that proangiogenic macrophages are perivascular across fetal organs and likely yolk-sac-derived as microglia. Our study provides a comprehensive map of the heterogeneity and developmental dynamics of human macrophages and unravels their diverse functions during development. Reference
GWAS of random glucose in 476,326 individuals
Conventional measurements of fasting and postprandial blood glucose levels investigated in genome-wide association studies (GWAS) cannot capture the effects of DNA variability on ‘around the clock’ glucoregulatory processes.
Here we show that GWAS meta-analysis of glucose measurements under nonstandardized conditions (random glucose (RG)) in 476,326 individuals of diverse ancestries and without diabetes enables locus discovery and innovative pathophysiological observations.
We discovered 120 RG loci represented by 150 distinct signals, including 13 with sex-dimorphic effects, two cross-ancestry and seven rare frequency signals. Of these, 44 loci are new for glycemic traits. Regulatory, glycosylation and metagenomic annotations highlight ileum and colon tissues, indicating an underappreciated role of the gastrointestinal tract in controlling blood glucose. Reference
Spatial multimodal analysis of transcriptomes and metabolomes in tissues
We present a spatial omics approach that combines histology, mass spectrometry imaging and spatial transcriptomics to facilitate precise measurements of mRNA transcripts and low-molecular-weight metabolites across tissue regions.
The workflow is compatible with commercially available Visium glass slides. We demonstrate the potential of our method using mouse and human brain samples in the context of dopamine and Parkinson’s disease. Reference
The Oncology Biomarker Discovery framework reveals cetuximab and bevacizumab response patterns in metastatic colorectal cancer
Precision medicine has revolutionised cancer treatments; however, actionable biomarkers remain scarce. To address this, we develop the Oncology Biomarker Discovery (OncoBird) framework for analysing the molecular and biomarker landscape of randomised controlled clinical trials.
OncoBird identifies biomarkers based on single genes or mutually exclusive genetic alterations in isolation or in the context of tumour subtypes, and finally, assesses predictive components by their treatment interactions. Here, we utilise the open-label, randomised phase III trial (FIRE-3, AIO KRK-0306) in metastatic colorectal carcinoma patients, who received either cetuximab or bevacizumab in combination with 5-fluorouracil, folinic acid and irinotecan (FOLFIRI).
We systematically identify five biomarkers with predictive components, e.g., patients with tumours that carry chr20q amplifications or lack mutually exclusive ERK signalling mutations benefited from cetuximab compared to bevacizumab. In summary, OncoBird characterises the molecular landscape and outlines actionable biomarkers, which generalises to any molecularly characterised randomised controlled trial. Reference
A pan-cancer single-cell panorama of human natural killer cells
Natural killer (NK) cells play indispensable roles in innate immune responses against tumor progression. To depict their phenotypic and functional diversities in the tumor microenvironment, we perform integrative single-cell RNA sequencing analyses on NK cells from 716 patients with cancer, covering 24 cancer types.
We observed heterogeneity in NK cell composition in a tumor-type-specific manner. Notably, we have identified a group of tumor-associated NK cells that are enriched in tumors, show impaired anti-tumor functions, and are associated with unfavorable prognosis and resistance to immunotherapy.
Specific myeloid cell subpopulations, in particular LAMP3+ dendritic cells, appear to mediate the regulation of NK cell anti-tumor immunity. Our study provides insights into NK-cell-based cancer immunity and highlights potential clinical utilities of NK cell subsets as therapeutic targets. Reference