Update on: January 24, 2023
Base editing screens map mutations affecting interferon-γ signaling in cancer
Interferon-γ (IFN-γ) signaling mediates host responses to infection, inflammation and anti-tumor immunity. Mutations in the IFN-γ signaling pathway cause immunological disorders, hematological malignancies, and resistance to immune checkpoint blockade (ICB) in cancer; however, the function of most clinically observed variants remains unknown.

Here, we systematically investigate the genetic determinants of IFN-γ response in colorectal cancer cells using CRISPR-Cas9 screens and base editing mutagenesis. Deep mutagenesis of JAK1 with cytidine and adenine base editors, combined with pathway-wide screens, reveal loss-of-function and gain-of-function mutations, including causal variants in hematological malignancies and mutations detected in patients refractory to ICB.
We functionally validate variants of uncertain significance in primary tumor organoids, where engineering missense mutations in JAK1 enhanced or reduced sensitivity to autologous tumor-reactive T cells. We identify more than 300 predicted missense mutations altering IFN-γ pathway activity, generating a valuable resource for interpreting gene variant function. Reference
Genomic autopsy to identify underlying causes of pregnancy loss and perinatal death
Pregnancy loss and perinatal death are devastating events for families. We assessed ‘genomic autopsy’ as an adjunct to standard autopsy for 200 families who had experienced fetal or newborn death, providing a definitive or candidate genetic diagnosis in 105 families.

Our cohort provides evidence of severe atypical in utero presentations of known genetic disorders and identifies novel phenotypes and disease genes. Inheritance of 42% of definitive diagnoses were either autosomal recessive (30.8%), X-linked recessive (3.8%) or autosomal dominant (excluding de novos, 7.7%), with risk of recurrence in future pregnancies. We report that at least ten families (5%) used their diagnosis for preimplantation (5) or prenatal diagnosis (5) of 12 pregnancies.
We emphasize the clinical importance of genomic investigations of pregnancy loss and perinatal death, with short turnaround times for diagnostic reporting and followed by systematic research follow-up investigations. This approach has the potential to enable accurate counseling for future pregnancies. Reference
Impact of the Euro 2020 championship on the spread of COVID-19
Large-scale events like the UEFA Euro 2020 football (soccer) championship offer a unique opportunity to quantify the impact of gatherings on the spread of COVID-19, as the number and dates of matches played by participating countries resembles a randomized study.

Using Bayesian modeling and the gender imbalance in COVID-19 data, we attribute 840,000 (95% CI: [0.39M, 1.26M]) COVID-19 cases across 12 countries to the championship. The impact depends non-linearly on the initial incidence, the reproduction number R, and the number of matches played.
The strongest effects are seen in Scotland and England, where as much as 10,000 primary cases per million inhabitants occur from championship-related gatherings. The average match-induced increase in R was 0.46 [0.18, 0.75] on match days, but important matches caused an increase as large as +3. Altogether, our results provide quantitative insights that help judge and mitigate the impact of large-scale events on pandemic spread. Reference
Single-cell transcriptomics reveals a mechanosensitive injury signaling pathway in early diabetic nephropathy
Diabetic nephropathy (DN) is the leading cause of end-stage renal disease, and histopathologic glomerular lesions are among the earliest structural alterations of DN. However, the signaling pathways that initiate these glomerular alterations are incompletely understood.

To delineate the cellular and molecular basis for DN initiation, we performed single-cell and bulk RNA sequencing of renal cells from type 2 diabetes mice (BTBR ob/ob) at the early stage of DN.
Analysis of differentially expressed genes revealed glucose-independent responses in glomerular cell types. The gene regulatory network upstream of glomerular cell programs suggested the activation of mechanosensitive transcriptional pathway MRTF-SRF predominantly taking place in mesangial cells. Importantly, activation of MRTF-SRF transcriptional pathway was also identified in DN glomeruli in independent patient cohort datasets. Furthermore, ex vivo kidney perfusion suggested that the regulation of MRTF-SRF is a common mechanism in response to glomerular hyperfiltration. Reference
Development of a treatment selection algorithm for SGLT2 and DPP-4 inhibitor therapies in people with type 2 diabetes
Current treatment guidelines do not provide recommendations to support the selection of treatment for most people with type 2 diabetes. We aimed to develop and validate an algorithm to allow selection of optimal treatment based on glycaemic response, weight change, and tolerability outcomes when choosing between SGLT2 inhibitor or DPP-4 inhibitor therapies.

In this retrospective cohort study, we identified patients initiating SGLT2 and DPP-4 inhibitor therapies after Jan 1, 2013, from the UK Clinical Practice Research Datalink (CPRD). We excluded those who received SGLT2 or DPP-4 inhibitors as first-line treatment or insulin at the same time, had estimated glomerular filtration rate (eGFR) of less than 45 mL/min per 1·73 m2, or did not have a valid baseline glycated haemoglobin (HbA1c) measure (<53 or ≥120 mmol/mol).
Among 10 253 patients initiating SGLT2 inhibitors and 16 624 patients initiating DPP-4 inhibitors in CPRD, baseline HbA1c, age, BMI, eGFR, and alanine aminotransferase were associated with differential HbA1c outcome with SGLT2 inhibitor and DPP-4 inhibitor therapies. The median age of participants was 62·0 years (IQR 55·0–70·0). 10 016 (37·3%) were women and 16 861 (62·7%) were men. An algorithm based on these five features identified a subgroup, representing around four in ten CPRD patients, with a 5 mmol/mol or greater observed benefit with SGLT2 inhibitors in all validation cohorts (CPRD 8·8 mmol/mol [95% CI 7·8–9·8]; CANTATA-D and CANTATA-D2 trials 5·8 mmol/mol [3·9–7·7]; BI1245.20 trial 6·6 mmol/mol [2·2–11·0]). In CPRD, predicted differential HbA1c response with SGLT2 inhibitor and DPP-4 inhibitor therapies was not associated with weight change. Reference
Spatially aware dimension reduction for spatial transcriptomics
Spatial transcriptomics are a collection of genomic technologies that have enabled transcriptomic profiling on tissues with spatial localization information.

Here, we develop a spatially-aware dimension reduction method, SpatialPCA, that can extract a low dimensional representation of the spatial transcriptomics data with biological signal and preserved spatial correlation structure, thus unlocking many existing computational tools previously developed in single-cell RNAseq studies for tailored analysis of spatial transcriptomics. We illustrate the benefits of SpatialPCA for spatial domain detection and explores its utility for trajectory inference on the tissue and for high-resolution spatial map construction.
In the real data applications, SpatialPCA identifies key molecular and immunological signatures in a detected tumor surrounding microenvironment, including a tertiary lymphoid structure that shapes the gradual transcriptomic transition during tumorigenesis and metastasis. Reference
Systematic single-cell pathway analysis to characterize early T cell activation

Circulating tumour DNA characterisation of invasive lobular carcinoma in patients with metastatic breast cancer
Limited data exist to characterise molecular differences in circulating tumour DNA (ctDNA) for patients with invasive lobular carcinoma (ILC). We analysed metastatic breast cancer patients with ctDNA testing to assess genomic differences among patients with ILC, invasive ductal carcinoma (IDC), and mixed histology.

We retrospectively analysed 980 clinically annotated patients (121 ILC, 792 IDC, and 67 mixed histology) from three academic centers with ctDNA evaluation by Guardant360™. Single nucleotide variations (SNVs), copy number variations (CNVs), and oncogenic pathways were compared across histologies.
ILC was significantly associated with HR+ HER2 negative and HER2 low. SNVs were higher in patients with ILC compared to IDC or mixed histology (Mann Whitney U test, P < 0.05). In multivariable analysis, HR+ HER2 negative ILC was significantly associated with mutations in CDH1 (odds ratio (OR) 9.4, [95% CI 3.3–27.2]), ERBB2 (OR 3.6, [95% confidence interval (CI) 1.6–8.2]), and PTEN (OR 2.5, [95% CI 1.05–5.8]) genes. CDH1 mutations were not present in the mixed histology cohort. Reference
Proteomic signatures for identification of impaired glucose tolerance
The implementation of recommendations for type 2 diabetes (T2D) screening and diagnosis focuses on the measurement of glycated hemoglobin (HbA1c) and fasting glucose.

This approach leaves a large number of individuals with isolated impaired glucose tolerance (iIGT), who are only detectable through oral glucose tolerance tests (OGTTs), at risk of diabetes and its severe complications.
We applied machine learning to the proteomic profiles of a single fasted sample from 11,546 participants of the Fenland study to test discrimination of iIGT defined using the gold-standard OGTTs. We observed significantly improved discriminative performance by adding only three proteins (RTN4R, CBPM and GHR) to the best clinical model (AUROC = 0.80 (95% confidence interval: 0.79–0.86), P = 0.004), which we validated in an external cohort.
Increased plasma levels of these candidate proteins were associated with an increased risk for future T2D in an independent cohort and were also increased in individuals genetically susceptible to impaired glucose homeostasis and T2D. Assessment of a limited number of proteins can identify individuals likely to be missed by current diagnostic strategies and at high risk of T2D and its complications. Reference
Evaluation of cell-free DNA approaches for multi-cancer early detection
In the Circulating Cell-free Genome Atlas (NCT02889978) substudy 1, we evaluate several approaches for a circulating cell-free DNA (cfDNA)-based multi-cancer early detection (MCED) test by defining clinical limit of detection (LOD) based on circulating tumor allele fraction (cTAF), enabling performance comparisons.

Among 10 machine-learning classifiers trained on the same samples and independently validated, when evaluated at 98% specificity, those using whole-genome (WG) methylation, single nucleotide variants with paired white blood cell background removal, and combined scores from classifiers evaluated in this study show the highest cancer signal detection sensitivities.
Compared with clinical stage and tumor type, cTAF is a more significant predictor of classifier performance and may more closely reflect tumor biology. Clinical LODs mirror relative sensitivities for all approaches. The WG methylation feature best predicts cancer signal origin. WG methylation is the most promising technology for MCED and informs development of a targeted methylation MCED test. Reference
MEDICC2: whole-genome doubling aware copy-number phylogenies for cancer evolution
Aneuploidy, chromosomal instability, somatic copy-number alterations, and whole-genome doubling (WGD) play key roles in cancer evolution and provide information for the complex task of phylogenetic inference.

We present MEDICC2, a method for inferring evolutionary trees and WGD using haplotype-specific somatic copy-number alterations from single-cell or bulk data. MEDICC2 eschews simplifications such as the infinite sites assumption, allowing multiple mutations and parallel evolution, and does not treat adjacent loci as independent, allowing overlapping copy-number events.
Using simulations and multiple data types from 2780 tumors, we use MEDICC2 to demonstrate accurate inference of phylogenies, clonal and subclonal WGD, and ancestral copy-number states. Reference
Histone H3 proline 16 hydroxylation regulates mammalian gene expression
Histone post-translational modifications (PTMs) are important for regulating various DNA-templated processes.

Here, we report the existence of a histone PTM in mammalian cells, namely histone H3 with hydroxylation of proline at residue 16 (H3P16oh), which is catalyzed by the proline hydroxylase EGLN2. We show that H3P16oh enhances direct binding of KDM5A to its substrate, histone H3 with trimethylation at the fourth lysine residue (H3K4me3), resulting in enhanced chromatin recruitment of KDM5A and a corresponding decrease of H3K4me3 at target genes. Genome- and transcriptome-wide analyses show that the EGLN2–KDM5A axis regulates target gene expression in mammalian cells.
Specifically, our data demonstrate repression of the WNT pathway negative regulator DKK1 through the EGLN2-H3P16oh-KDM5A pathway to promote WNT/β-catenin signaling in triple-negative breast cancer (TNBC). This study characterizes a regulatory mark in the histone code and reveals a role for H3P16oh in regulating mammalian gene expression. Reference
A comprehensive Bioconductor ecosystem for the design of CRISPR guide RNAs across nucleases and technologies
The success of CRISPR-mediated gene perturbation studies is highly dependent on the quality of gRNAs, and several tools have been developed to enable optimal gRNA design.

However, these tools are not all adaptable to the latest CRISPR modalities or nucleases, nor do they offer comprehensive annotation methods for advanced CRISPR applications. Here, we present a new ecosystem of R packages, called crisprVerse, that enables efficient gRNA design and annotation for a multitude of CRISPR technologies. This includes CRISPR knockout (CRISPRko), CRISPR activation (CRISPRa), CRISPR interference (CRISPRi), CRISPR base editing (CRISPRbe) and CRISPR knockdown (CRISPRkd).
The core package, crisprDesign, offers a user-friendly and unified interface to add off-target annotations, rich gene and SNP annotations, and on- and off-target activity scores. These functionalities are enabled for any RNA- or DNA-targeting nucleases, including Cas9, Cas12, and Cas13. The crisprVerse ecosystem is open-source and deployed through the Bioconductor project. Reference
scTAM-seq enables targeted high-confidence analysis of DNA methylation in single cells
Single-cell DNA methylation profiling currently suffers from excessive noise and/or limited cellular throughput.

We developed scTAM-seq, a targeted bisulfite-free method for profiling up to 650 CpGs in up to 10,000 cells per experiment, with a dropout rate as low as 7%. We demonstrate that scTAM-seq can resolve DNA methylation dynamics across B-cell differentiation in blood and bone marrow, identifying intermediate differentiation states that were previously masked.
scTAM-seq additionally queries surface-protein expression, thus enabling integration of single-cell DNA methylation information with cell atlas data. In summary, scTAM-seq is a high-throughput, high-confidence method for analyzing DNA methylation at single-CpG resolution across thousands of single cells. Reference
Deep transfer learning of cancer drug responses by integrating bulk and single-cell RNA-seq data
Drug screening data from massive bulk gene expression databases can be analyzed to determine the optimal clinical application of cancer drugs.

The growing amount of single-cell RNA sequencing (scRNA-seq) data also provides insights into improving therapeutic effectiveness by helping to study the heterogeneity of drug responses for cancer cell subpopulations.
Developing computational approaches to predict and interpret cancer drug response in single-cell data collected from clinical samples can be very useful. We propose scDEAL, a deep transfer learning framework for cancer drug response prediction at the single-cell level by integrating large-scale bulk cell-line data. The highlight in scDEAL involves harmonizing drug-related bulk RNA-seq data with scRNA-seq data and transferring the model trained on bulk RNA-seq data to predict drug responses in scRNA-seq.
Another feature of scDEAL is the integrated gradient feature interpretation to infer the signature genes of drug resistance mechanisms. We benchmark scDEAL on six scRNA-seq datasets and demonstrate its model interpretability via three case studies focusing on drug response label prediction, gene signature identification, and pseudotime analysis. Reference
Multi-omic analyses of changes in the tumor microenvironment of pancreatic adenocarcinoma
Successful pancreatic ductal adenocarcinoma (PDAC) immunotherapy necessitates optimization and maintenance of activated effector T cells (Teff).

We prospectively collected and applied multi-omic analyses to paired pre- and post-treatment PDAC specimens collected in a platform neoadjuvant study of granulocyte-macrophage colony-stimulating factor-secreting allogeneic PDAC vaccine (GVAX) vaccine ± nivolumab (anti-programmed cell death protein 1 [PD-1]) to uncover sensitivity and resistance mechanisms. We show that GVAX-induced tertiary lymphoid aggregates become immune-regulatory sites in response to GVAX + nivolumab. Higher densities of tumor-associated neutrophils (TANs) following GVAX + nivolumab portend poorer overall survival (OS).
Increased T cells expressing CD137 associated with cytotoxic Teff signatures and correlated with increased OS. Bulk and single-cell RNA sequencing found that nivolumab alters CD4+ T cell chemotaxis signaling in association with CD11b+ neutrophil degranulation, and CD8+ T cell expression of CD137 was required for optimal T cell activation. These findings provide insights into PD-1-regulated immune pathways in PDAC that should inform more effective therapeutic combinations that include TAN regulators and T cell activators. Reference
Computational pharmacogenomic screen identifies drugs that potentiate the anti-breast cancer activity of statins
Statins, a family of FDA-approved cholesterol-lowering drugs that inhibit the rate-limiting enzyme of the mevalonate metabolic pathway, have demonstrated anticancer activity.

Evidence shows that dipyridamole potentiates statin-induced cancer cell death by blocking a restorative feedback loop triggered by statin treatment. Leveraging this knowledge, we develop an integrative pharmacogenomics pipeline to identify compounds similar to dipyridamole at the level of drug structure, cell sensitivity and molecular perturbation. To overcome the complex polypharmacology of dipyridamole, we focus our pharmacogenomics pipeline on mevalonate pathway genes, which we name mevalonate drug-network fusion (MVA-DNF).
We validate top-ranked compounds, nelfinavir and honokiol, and identify that low expression of the canonical epithelial cell marker, E-cadherin, is associated with statin-compound synergy. Analysis of remaining prioritized hits led to the validation of additional compounds, clotrimazole and vemurafenib. Thus, our computational pharmacogenomic approach identifies actionable compounds with pathway-specific activities. Reference
Multiomic analysis reveals conservation of cancer-associated fibroblast phenotypes across species and tissue of origin
