Update on: May 29, 2023
Integrating genetics with single-cell multiomic identifies mechanisms of beta cell dysfunction in type 2 diabetes
Dysfunctional pancreatic islet beta cells are a hallmark of type 2 diabetes (T2D), but a comprehensive understanding of the underlying mechanisms, including gene dysregulation, is lacking.

Here we integrate information from measurements of chromatin accessibility, gene expression and function in single beta cells with genetic association data to nominate disease-causal gene regulatory changes in T2D. Using machine learning on chromatin accessibility data from 34 nondiabetic, pre-T2D and T2D donors, we identify two transcriptionally and functionally distinct beta cell subtypes that undergo an abundance shift during T2D progression. Subtype-defining accessible chromatin is enriched for T2D risk variants, suggesting a causal contribution of subtype identity to T2D.
Both beta cell subtypes exhibit activation of a stress-response transcriptional program and functional impairment in T2D, which is probably induced by the T2D-associated metabolic environment. Our findings demonstrate the power of multimodal single-cell measurements combined with machine learning for characterizing mechanisms of complex diseases. Reference
A computational method for cell type-specific eQTL mapping using bulk RNA-seq data
Mapping cell type-specific gene expression quantitative trait loci (ct-eQTLs) is a powerful way to investigate the genetic basis of complex traits. A popular method for ct-eQTL mapping is to assess the interaction between the genotype of a genetic locus and the abundance of a specific cell type using a linear model.

However, this approach requires transforming RNA-seq count data, which distorts the relation between gene expression and cell type proportions and results in reduced power and/or inflated type I error. To address this issue, we have developed a statistical method called CSeQTL that allows for ct-eQTL mapping using bulk RNA-seq count data while taking advantage of allele-specific expression.
We validated the results of CSeQTL through simulations and real data analysis, comparing CSeQTL results to those obtained from purified bulk RNA-seq data or single cell RNA-seq data. Using our ct-eQTL findings, we were able to identify cell types relevant to 21 categories of human traits. Reference
Genomic hallmarks and therapeutic implications of G0 cell cycle arrest in cancer
Therapy resistance in cancer is often driven by a subpopulation of cells that are temporarily arrested in a non-proliferative G0 state, which is difficult to capture and whose mutational drivers remain largely unknown.

We develop methodology to robustly identify this state from transcriptomic signals and characterise its prevalence and genomic constraints in solid primary tumours. We show that G0 arrest preferentially emerges in the context of more stable, less mutated genomes which maintain TP53 integrity and lack the hallmarks of DNA damage repair deficiency, while presenting increased APOBEC mutagenesis. We employ machine learning to uncover novel genomic dependencies of this process and validate the role of the centrosomal gene CEP89 as a modulator of proliferation and G0 arrest capacity.
Lastly, we demonstrate that G0 arrest underlies unfavourable responses to various therapies exploiting cell cycle, kinase signalling and epigenetic mechanisms in single-cell data. We propose a G0 arrest transcriptional signature that is linked with therapeutic resistance and can be used to further study and clinically track this state. Reference
An integrated tumor, immune and microbiome atlas of colon cancer
The lack of multi-omics cancer datasets with extensive follow-up information hinders the identification of accurate biomarkers of clinical outcome.

In this cohort study, we performed comprehensive genomic analyses on fresh-frozen samples from 348 patients affected by primary colon cancer, encompassing RNA, whole-exome, deep T cell receptor and 16S bacterial rRNA gene sequencing on tumor and matched healthy colon tissue, complemented with tumor whole-genome sequencing for further microbiome characterization.
A type 1 helper T cell, cytotoxic, gene expression signature, called Immunologic Constant of Rejection, captured the presence of clonally expanded, tumor-enriched T cell clones and outperformed conventional prognostic molecular biomarkers, such as the consensus molecular subtype and the microsatellite instability classifications. Quantification of genetic immunoediting, defined as a lower number of neoantigens than expected, further refined its prognostic value. We identified a microbiome signature, driven by Ruminococcus bromii, associated with a favorable outcome. Reference
Single cell transcriptomics clarifies the basophil differentiation trajectory
Basophils are the rarest granulocytes and are recognized as critical cells for type 2 immune responses. However, their differentiation pathway remains to be fully elucidated.

Here, we assess the ontogenetic trajectory of basophils by single-cell RNA sequence analysis. Combined with flow cytometric and functional analyses, we identify c-Kit–CLEC12Ahi pre-basophils located downstream of pre-basophil and mast cell progenitors (pre-BMPs) and upstream of CLEC12Alo mature basophils. The transcriptomic analysis predicts that the pre-basophil population includes previously-defined basophil progenitor (BaP)-like cells in terms of gene expression profile.
Pre-basophils are highly proliferative and respond better to non-IgE stimuli but less to antigen plus IgE stimulation than do mature basophils. Although pre-basophils usually remain in the bone marrow, they emerge in helminth-infected tissues, probably through IL-3-mediated inhibition of their retention in the bone marrow. Thus, the present study identifies pre-basophils that bridge the gap between pre-BMPs and mature basophils during basophil ontogeny. Reference
Phenome-wide analyses identify an association between the POE dependent methylome and the rate of aging in humans
The variation in the rate at which humans age may be rooted in early events acting through the genomic regions that are influenced by such events and subsequently are related to health phenotypes in later life.

The parent-of-origin-effect (POE)-regulated methylome includes regions enriched for genetically controlled imprinting effects (the typical type of POE) and regions influenced by environmental effects associated with parents (the atypical POE). We perform a phenome-wide association analysis for the POE-influenced methylome using GS:SFHS (Ndiscovery = 5087, Nreplication = 4450). We identify and replicate 92 POE-CpG-phenotype associations. Most of the associations are contributed by the POE-CpGs belonging to the atypical class where the most strongly enriched associations are with aging (DNAmTL acceleration), intelligence, and parental (maternal) smoking exposure phenotypes.
A proportion of the atypical POE-CpGs form co-methylation networks (modules) which are associated with these phenotypes, with one of the aging-associated modules displaying increased within-module methylation connectivity with age. The atypical POE-CpGs also display high levels of methylation heterogeneity, fast information loss with age, and a strong correlation with CpGs contained within epigenetic clocks. Reference
A draft human pangenome reference
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals.

These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38.
Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample. Reference
A deep learning algorithm to predict risk of pancreatic cancer from disease trajectories
Pancreatic cancer is an aggressive disease that typically presents late with poor outcomes, indicating a pronounced need for early detection.

In this study, we applied artificial intelligence methods to clinical data from 6 million patients (24,000 pancreatic cancer cases) in Denmark (Danish National Patient Registry (DNPR)) and from 3 million patients (3,900 cases) in the United States (US Veterans Affairs (US-VA)). We trained machine learning models on the sequence of disease codes in clinical histories and tested prediction of cancer occurrence within incremental time windows (CancerRiskNet).
For cancer occurrence within 36 months, the performance of the best DNPR model has area under the receiver operating characteristic (AUROC) curve = 0.88 and decreases to AUROC (3m) = 0.83 when disease events within 3 months before cancer diagnosis are excluded from training, with an estimated relative risk of 59 for 1,000 highest-risk patients older than age 50 years. Reference
High-throughput deep learning variant effect prediction with Sequence UNET
Understanding coding mutations is important for many applications in biology and medicine but the vast mutation space makes comprehensive experimental characterisation impossible.

Current predictors are often computationally intensive and difficult to scale, including recent deep learning models. We introduce Sequence UNET, a highly scalable deep learning architecture that classifies and predicts variant frequency from sequence alone using multi-scale representations from a fully convolutional compression/expansion architecture.
It achieves comparable pathogenicity prediction to recent methods. We demonstrate scalability by analysing 8.3B variants in 904,134 proteins detected through large-scale proteomics. Sequence UNET runs on modest hardware with a simple Python package. Reference
An automated histological classification system for precision diagnostics of kidney allografts
For three decades, the international Banff classification has been the gold standard for kidney allograft rejection diagnosis, but this system has become complex over time with the integration of multimodal data and rules, leading to misclassifications that can have deleterious therapeutic consequences for patients.

To improve diagnosis, we developed a decision-support system, based on an algorithm covering all classification rules and diagnostic scenarios, that automatically assigns kidney allograft diagnoses. We then tested its ability to reclassify rejection diagnoses for adult and pediatric kidney transplant recipients in three international multicentric cohorts and two large prospective clinical trials, including 4,409 biopsies from 3,054 patients (62.05% male and 37.95% female) followed in 20 transplant referral centers in Europe and North America.
In the adult kidney transplant population, the Banff Automation System reclassified 83 out of 279 (29.75%) antibody-mediated rejection cases and 57 out of 105 (54.29%) T cell-mediated rejection cases, whereas 237 out of 3,239 (7.32%) biopsies diagnosed as non-rejection by pathologists were reclassified as rejection. Reference
Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding
