Update on: May 29, 2023

Integrating genetics with single-cell multiomic identifies mechanisms of beta cell dysfunction in type 2 diabetes

Dysfunctional pancreatic islet beta cells are a hallmark of type 2 diabetes (T2D), but a comprehensive understanding of the underlying mechanisms, including gene dysregulation, is lacking.

 Integrating genetics with single-cell multiomic measurements across disease states identifies mechanisms of beta cell dysfunction in type 2 diabetes
Reference: Gaowei Wang et al, Nature Genetics (2023)

Here we integrate information from measurements of chromatin accessibility, gene expression and function in single beta cells with genetic association data to nominate disease-causal gene regulatory changes in T2D. Using machine learning on chromatin accessibility data from 34 nondiabetic, pre-T2D and T2D donors, we identify two transcriptionally and functionally distinct beta cell subtypes that undergo an abundance shift during T2D progression. Subtype-defining accessible chromatin is enriched for T2D risk variants, suggesting a causal contribution of subtype identity to T2D.

Both beta cell subtypes exhibit activation of a stress-response transcriptional program and functional impairment in T2D, which is probably induced by the T2D-associated metabolic environment. Our findings demonstrate the power of multimodal single-cell measurements combined with machine learning for characterizing mechanisms of complex diseases. Reference

A computational method for cell type-specific eQTL mapping using bulk RNA-seq data

Mapping cell type-specific gene expression quantitative trait loci (ct-eQTLs) is a powerful way to investigate the genetic basis of complex traits. A popular method for ct-eQTL mapping is to assess the interaction between the genotype of a genetic locus and the abundance of a specific cell type using a linear model.

cell type-specific gene expression quantitative trait loci (ct-eQTLs)
Reference: Paul Little et al, Nature communications, 2023

However, this approach requires transforming RNA-seq count data, which distorts the relation between gene expression and cell type proportions and results in reduced power and/or inflated type I error. To address this issue, we have developed a statistical method called CSeQTL that allows for ct-eQTL mapping using bulk RNA-seq count data while taking advantage of allele-specific expression.

We validated the results of CSeQTL through simulations and real data analysis, comparing CSeQTL results to those obtained from purified bulk RNA-seq data or single cell RNA-seq data. Using our ct-eQTL findings, we were able to identify cell types relevant to 21 categories of human traits. Reference

Genomic hallmarks and therapeutic implications of G0 cell cycle arrest in cancer

Therapy resistance in cancer is often driven by a subpopulation of cells that are temporarily arrested in a non-proliferative G0 state, which is difficult to capture and whose mutational drivers remain largely unknown.

 Genomic hallmarks and therapeutic implications of G0 cell cycle arrest in cancer
Reference: Anna J. Wiecek et al, Genome Biology, 2023

We develop methodology to robustly identify this state from transcriptomic signals and characterise its prevalence and genomic constraints in solid primary tumours. We show that G0 arrest preferentially emerges in the context of more stable, less mutated genomes which maintain TP53 integrity and lack the hallmarks of DNA damage repair deficiency, while presenting increased APOBEC mutagenesis. We employ machine learning to uncover novel genomic dependencies of this process and validate the role of the centrosomal gene CEP89 as a modulator of proliferation and G0 arrest capacity.

Lastly, we demonstrate that G0 arrest underlies unfavourable responses to various therapies exploiting cell cycle, kinase signalling and epigenetic mechanisms in single-cell data. We propose a G0 arrest transcriptional signature that is linked with therapeutic resistance and can be used to further study and clinically track this state. Reference

An integrated tumor, immune and microbiome atlas of colon cancer

The lack of multi-omics cancer datasets with extensive follow-up information hinders the identification of accurate biomarkers of clinical outcome.

An integrated tumor, immune and microbiome atlas of colon cancer
Reference: Jessica Roelands et al, Nature medicine, 2023

In this cohort study, we performed comprehensive genomic analyses on fresh-frozen samples from 348 patients affected by primary colon cancer, encompassing RNA, whole-exome, deep T cell receptor and 16S bacterial rRNA gene sequencing on tumor and matched healthy colon tissue, complemented with tumor whole-genome sequencing for further microbiome characterization.

A type 1 helper T cell, cytotoxic, gene expression signature, called Immunologic Constant of Rejection, captured the presence of clonally expanded, tumor-enriched T cell clones and outperformed conventional prognostic molecular biomarkers, such as the consensus molecular subtype and the microsatellite instability classifications. Quantification of genetic immunoediting, defined as a lower number of neoantigens than expected, further refined its prognostic value. We identified a microbiome signature, driven by Ruminococcus bromii, associated with a favorable outcome. Reference

Single cell transcriptomics clarifies the basophil differentiation trajectory

Basophils are the rarest granulocytes and are recognized as critical cells for type 2 immune responses. However, their differentiation pathway remains to be fully elucidated.

Single cell transcriptomics clarifies the basophil differentiation trajectory
Reference: Kensuke Miyake et al, Nature communications, 2023

Here, we assess the ontogenetic trajectory of basophils by single-cell RNA sequence analysis. Combined with flow cytometric and functional analyses, we identify c-KitCLEC12Ahi pre-basophils located downstream of pre-basophil and mast cell progenitors (pre-BMPs) and upstream of CLEC12Alo mature basophils. The transcriptomic analysis predicts that the pre-basophil population includes previously-defined basophil progenitor (BaP)-like cells in terms of gene expression profile.

Pre-basophils are highly proliferative and respond better to non-IgE stimuli but less to antigen plus IgE stimulation than do mature basophils. Although pre-basophils usually remain in the bone marrow, they emerge in helminth-infected tissues, probably through IL-3-mediated inhibition of their retention in the bone marrow. Thus, the present study identifies pre-basophils that bridge the gap between pre-BMPs and mature basophils during basophil ontogeny. Reference

Phenome-wide analyses identify an association between the POE dependent methylome and the rate of aging in humans

The variation in the rate at which humans age may be rooted in early events acting through the genomic regions that are influenced by such events and subsequently are related to health phenotypes in later life.

Phenome-wide analysis
Reference: Chenhao Gao et al, Genome Biology, 2023

The parent-of-origin-effect (POE)-regulated methylome includes regions enriched for genetically controlled imprinting effects (the typical type of POE) and regions influenced by environmental effects associated with parents (the atypical POE). We perform a phenome-wide association analysis for the POE-influenced methylome using GS:SFHS (Ndiscovery = 5087, Nreplication = 4450). We identify and replicate 92 POE-CpG-phenotype associations. Most of the associations are contributed by the POE-CpGs belonging to the atypical class where the most strongly enriched associations are with aging (DNAmTL acceleration), intelligence, and parental (maternal) smoking exposure phenotypes.

A proportion of the atypical POE-CpGs form co-methylation networks (modules) which are associated with these phenotypes, with one of the aging-associated modules displaying increased within-module methylation connectivity with age. The atypical POE-CpGs also display high levels of methylation heterogeneity, fast information loss with age, and a strong correlation with CpGs contained within epigenetic clocks. Reference

A draft human pangenome reference

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals.

 A draft human pangenome reference
Reference: Wen-Wei Liao et al, Nature, 2023

These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38.

Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample. Reference

A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories

Pancreatic cancer is an aggressive disease that typically presents late with poor outcomes, indicating a pronounced need for early detection.

 A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories
Reference: Davide Placido et al , Nature medicine, 2023

In this study, we applied artificial intelligence methods to clinical data from 6 million patients (24,000 pancreatic cancer cases) in Denmark (Danish National Patient Registry (DNPR)) and from 3 million patients (3,900 cases) in the United States (US Veterans Affairs (US-VA)). We trained machine learning models on the sequence of disease codes in clinical histories and tested prediction of cancer occurrence within incremental time windows (CancerRiskNet).

For cancer occurrence within 36 months, the performance of the best DNPR model has area under the receiver operating characteristic (AUROC) curve = 0.88 and decreases to AUROC (3m) = 0.83 when disease events within 3 months before cancer diagnosis are excluded from training, with an estimated relative risk of 59 for 1,000 highest-risk patients older than age 50 years. Reference

High-throughput deep learning variant effect prediction with Sequence UNET

Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible.

 High-throughput deep learning variant effect prediction with Sequence UNET
Reference: Alistair S. Dunham et al, Genome Biology, 2023

Current predictors are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture.

It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package. Reference

An automated histological classification system for precision diagnostics of kidney allografts

For three decades, the international Banff classification has been the gold standard for kidney allograft rejection diagnosis, but this system has become complex over time with the integration of multimodal data and rules, leading to misclassifications that can have deleterious therapeutic consequences for patients.

 An automated histological classification system for precision diagnostics of kidney allografts
Reference: Daniel Yoo et al, Nature medicine, 2023

To improve diagnosis, we developed a decision-support system, based on an algorithm covering all classification rules and diagnostic scenarios, that automatically assigns kidney allograft diagnoses. We then tested its ability to reclassify rejection diagnoses for adult and pediatric kidney transplant recipients in three international multicentric cohorts and two large prospective clinical trials, including 4,409 biopsies from 3,054 patients (62.05% male and 37.95% female) followed in 20 transplant referral centers in Europe and North America.

In the adult kidney transplant population, the Banff Automation System reclassified 83 out of 279 (29.75%) antibody-mediated rejection cases and 57 out of 105 (54.29%) T cell-mediated rejection cases, whereas 237 out of 3,239 (7.32%) biopsies diagnosed as non-rejection by pathologists were reclassified as rejection. Reference

Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding

Hybrid potato breeding will transform the crop from a clonally propagated tetraploid to a seed-reproducing diploid. Historical accumulation of deleterious mutations in potato genomes has hindered the development of elite inbred lines and hybrids. Utilizing a whole-genome phylogeny of 92 Solanaceae and its sister clade species, we employ an evolutionary strategy to identify deleterious mutations.
Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding
Reference: Yaoyao Wu et al, Cell, 2023
The deep phylogeny reveals the genome-wide landscape of highly constrained sites, comprising ∼2.4% of the genome. Based on a diploid potato diversity panel, we infer 367,499 deleterious variants, of which 50% occur at non-coding and 15% at synonymous sites.
Counterintuitively, diploid lines with relatively high homozygous deleterious burden can be better starting material for inbred-line development, despite showing less vigorous growth. Inclusion of inferred deleterious mutations increases genomic-prediction accuracy for yield by 24.7%. Our study generates insights into the genome-wide incidence and properties of deleterious mutations and their far-reaching consequences for breeding. Reference

An extensive resource for Bioinformatics, Epigenomics, Genomics and Metagenomics