Science in this Week (January, 2019)

Update on: January 18, 2019

iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity

Genome engineering methods have advanced greatly with the development of programmable nucleases, but methods for quantifying on- and off-target cleavage sites and associated deletions remain nascent.

Here, we report an improvement of the GUIDE-seq method, iGUIDE, which allows filtering of mispriming events to clarify the true cleavage signal. Using iGUIDE, we specify the locations of Cas9-guided cleavage for four guide RNAs, characterize associated deletions, and show that naturally occurring background DNA double-strand breaks are associated with open chromatin, gene dense regions, and chromosomal fragile sites. Reference

An automated Bayesian pipeline for rapid analysis of single-molecule binding data

Single-molecule binding assays enable the study of how molecular machines assemble and function. Current algorithms can identify and locate individual molecules, but require tedious manual validation of each spot.

Moreover, no solution for high-throughput analysis of single-molecule binding data exists. Here, we describe an automated pipeline to analyze single-molecule data over a wide range of experimental conditions. In addition, our method enables state estimation on multivariate Gaussian signals. We validate our approach using simulated data, and benchmark the pipeline by measuring the binding properties of the well-studied, DNA-guided DNA endonuclease, TtAgo, an Argonaute protein from the Eubacterium Thermus thermophilus. Reference

Tumor mutational load predicts survival after immunotherapy across multiple cancer types

Immune checkpoint inhibitor (ICI) treatments benefit some patients with metastatic cancers, but predictive biomarkers are needed. Findings in selected cancer types suggest that tumor mutational burden (TMB) may predict clinical response to ICI.

To examine this association more broadly, we analyzed the clinical and genomic data of 1,662 advanced cancer patients treated with ICI, and 5,371 non-ICI-treated patients, whose tumors underwent targeted next-generation sequencing (MSK-IMPACT). Among all patients, higher somatic TMB (highest 20% in each histology) was associated with better overall survival. For most cancer histologies, an association between higher TMB and improved survival was observed. The TMB cutpoints associated with improved survival varied markedly between cancer types. Reference

A macrophage-based screen identifies antibacterial compounds selective for intracellular Salmonella Typhimurium

Salmonella Typhimurium (S. Tm) establishes systemic infection in susceptible hosts by evading the innate immune response and replicating within host phagocytes.

Here, we sought to identify inhibitors of intracellular S. Tm replication by conducting parallel chemical screens against S. Tm growing in macrophage-mimicking media and within macrophages. We identify several compounds that inhibit Salmonella growth in the intracellular environment and in acidic, ion-limited media. We report on the antimicrobial activity of the psychoactive drug metergoline, which is specific against intracellular S. Tm. Screening an S. Tm deletion library in the presence of metergoline reveals hypersensitization of outer membrane mutants to metergoline activity. Reference

A gene expression map of shoot domains reveals regulatory mechanisms

Gene regulatory networks control development via domain-specific gene expression. In seed plants, self-renewing stem cells located in the shoot apical meristem (SAM) produce leaves from the SAM peripheral zone. After initiation, leaves develop polarity patterns to form a planar shape.

Here we compare translating RNAs among SAM and leaf domains. Using translating ribosome affinity purification and RNA sequencing to quantify gene expression in target domains, we generate a domain-specific translatome map covering representative vegetative stage SAM and leaf domains. We discuss the predicted cellular functions of these domains and provide evidence that dome seemingly unrelated domains, utilize common regulatory modules. Experimental follow up shows that the RABBIT EARS and HANABA TARANU transcription factors have roles in axillary meristem initiation. Reference

High-performance medicine: the convergence of human and artificial intelligence

The use of artificial intelligence, and the deep-learning subtype in particular, has been enabled by the use of labeled big data, along with markedly enhanced computing power and cloud storage, across all sectors.

In medicine, this is beginning to have an impact at three levels: for clinicians, predominantly via rapid, accurate image interpretation; for health systems, by improving workflow and the potential for reducing medical errors; and for patients, by enabling them to process their own data to promote health. The current limitations, including bias, privacy and security, and lack of transparency, along with the future directions of these applications will be discussed in this article. Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient–doctor relationship or facilitate its erosion remains to be seen. Reference

NG-TAS: an optimised protocol and computational pipeline for cost-effective profiling of circulating tumour DNA

Circulating tumour DNA (ctDNA) detection and monitoring have enormous potential clinical utility in oncology.

We describe here a fast, flexible and cost-effective method to profile multiple genes simultaneously in low input cell-free DNA (cfDNA): Next Generation-Targeted Amplicon Sequencing (NG-TAS). We designed a panel of 377 amplicons spanning 20 cancer genes and tested the NG-TAS pipeline using cell-free DNA from two HapMap lymphoblastoid cell lines. NG-TAS consistently detected mutations in cfDNA when mutation allele fraction was > 1%. We applied NG-TAS to a clinical cohort of metastatic breast cancer patients, demonstrating its potential in monitoring the disease. Reference

Accurate prediction of cell type-specific transcription factor binding

Prediction of cell type-specific, in vivo transcription factor binding sites is one of the central challenges in regulatory genomics. Here, we present our approach that earned a shared first rank in the “ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge” in 2017.

In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessibility and binding motifs are sufficient to yield state-of-the-art performance. Finally, we provide 682 lists of predicted peaks for a total of 31 transcription factors in 22 primary cell types and tissues and a user-friendly version of our approach, Catchitt, for download. Reference

Genome-wide profiling of adenine base editor specificity by EndoV-seq

The adenine base editor (ABE), capable of catalyzing A•T to G•C conversions, is an important gene editing toolbox. Here, we systematically evaluate genome-wide off-target deamination by ABEs using the EndoV-seq platform we developed.

EndoV-seq utilizes Endonuclease V to nick the inosine-containing DNA strand of genomic DNA deaminated by ABE in vitro. The treated DNA is then whole-genome sequenced to identify off-target sites. Of the eight gRNAs we tested with ABE, 2–19 (with an average of 8.0) off-target sites are found, significantly fewer than those found for canonical Cas9 nuclease (7–320, 160.7 on average). In vivo off-target deamination is further validated through target site deep sequencing. Moreover, we demonstrated that six different ABE-gRNA complexes could be examined in a single EndoV-seq assay. Reference

An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar

How viruses evolve within hosts can dictate infection outcomes; however, reconstructing this process is challenging. We evaluate our multiplexed amplicon approach, PrimalSeq, to demonstrate how virus concentration, sequencing coverage, primer mismatches, and replicates influence the accuracy of measuring intrahost virus diversity.

We develop an experimental protocol and computational tool, iVar, for using PrimalSeq to measure virus diversity using Illumina and compare the results to Oxford Nanopore sequencing. We demonstrate the utility of PrimalSeq by measuring Zika and West Nile virus diversity from varied sample types and show that the accumulation of genetic diversity is influenced by experimental and biological systems. Reference

The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens

The Network of Cancer Genes (NCG) is a manually curated repository of 2372 genes whose somatic modifications have known or predicted cancer driver roles.

These genes were collected from 275 publications, including two sources of known cancer genes and 273 cancer sequencing screens of more than 100 cancer types from 34,905 cancer donors and multiple primary sites. This represents a more than 1.5-fold content increase compared to the previous version. NCG also annotates properties of cancer genes, such as duplicability, evolutionary origin, RNA and protein expression, miRNA and protein interactions, and protein function and essentiality. Reference

Cell-type-specific metabolic labeling, detection and identification of nascent proteomes in vivo

A big challenge in proteomics is the identification of cell-type-specific proteomes in vivo. This protocol describes how to label, purify and identify cell-type-specific proteomes in living mice.

To make this possible, we created a Cre-recombinase-inducible mouse line expressing a mutant methionyl-tRNA synthetase (L274G), which enables the labeling of nascent proteins with the non-canonical amino acid azidonorleucine (ANL). This amino acid can be conjugated to different affinity tags by click chemistry. After affinity purification (AP), the labeled proteins can be identified by tandem mass spectrometry (MS/MS). With this method, it is possible to identify cell-type-specific proteomes derived from living animals, which was not possible with any previously published method. Reference

Age-related remodelling of oesophageal epithelia by mutated cancer drivers

Clonal expansion in aged normal tissues has been implicated in the development of cancer. However, the chronology and risk dependence of the expansion are poorly understood.

Here we intensively sequence 682 micro-scale oesophageal samples and show, in physiologically normal oesophageal epithelia, the progressive age-related expansion of clones that carry mutations in driver genes (predominantly NOTCH1), which is substantially accelerated by alcohol consumption and by smoking. Driver-mutated clones emerge multifocally from early childhood and increase their number and size with ageing, and ultimately replace almost the entire oesophageal epithelium in the extremely elderly. Reference

Circadian oscillations of cytosine modification in humans contribute to epigenetic variability, aging, and complex disease

Maintenance of physiological circadian rhythm plays a crucial role in human health. Numerous studies have shown that disruption of circadian rhythm may increase risk for malignant, psychiatric, metabolic, and other diseases.

Extending our recent findings of oscillating cytosine modifications (osc-modCs) in mice, in this study, we show that osc-modCs are also prevalent in human neutrophils. Osc-modCs may play a role in gene regulation, can explain parts of intra- and inter-individual epigenetic variation, and are signatures of aging. Finally, we show that osc-modCs are linked to three complex diseases and provide a new interpretation of cross-sectional epigenome-wide association studies. Reference

Integration of DNA methylation patterns and genetic variation in human pediatric tissues

The widespread use of accessible peripheral tissues for epigenetic analyses has prompted increasing interest in the study of tissue-specific DNA methylation (DNAm) variation in human populations.

To date, characterizations of inter-individual DNAm variability and DNAm concordance across tissues have been largely performed in adult tissues and therefore are limited in their relevance to DNAm profiles from pediatric samples.  BECs had greater inter-individual DNAm variability compared to PBMCs and highly the variable CpGs are more likely to be positively correlated between the matched tissues compared to less variable CpGs. These sites were enriched for CpGs under genetic influence, suggesting that a substantial proportion of DNAm covariation between tissues can be attributed to genetic variation. Finally, we demonstrated the relevance of our findings to human epigenetic studies by categorizing CpGs from published DNAm association studies of pediatric BECs and peripheral blood. Reference

FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association

In the process of post-transcription, microRNAs (miRNAs) are closely related to various complex human diseases. Traditional verification methods for miRNA-disease associations take a lot of time and expense, so it is especially important to design computational methods for detecting potential associations.

Considering the restrictions of previous computational methods for predicting potential miRNAs-disease associations, we develop the model of FKL-Spa-LapRLS (Fast Kernel Learning Sparse kernel Laplacian Regularized Least Squares) to break through the limitations. Reference

Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes

Most eukaryotic genes comprise exons and introns thus requiring the precise removal of introns from pre-mRNAs to enable protein biosynthesis. U2 and U12 spliceosomes catalyze this step by recognizing motifs on the transcript in order to remove the introns. A process which is dependent on precise definition of exon-intron borders by splice sites, which are consequently highly conserved across species.

Only very few combinations of terminal dinucleotides are frequently observed at intron ends, dominated by the canonical GT-AG splice sites on the DNA level.  Here we investigate the occurrence of diverse combinations of dinucleotides at predicted splice sites. Analyzing 121 plant genome sequences based on their annotation revealed strong splice site conservation across species, annotation errors, and true biological divergence from canonical splice sites. Reference

Time-resolved mapping of genetic interactions to model rewiring of signaling pathways

Context-dependent changes in genetic interactions are an important feature of cellular pathways and their varying responses under different environmental conditions. However, methodological frameworks to investigate the plasticity of genetic interaction networks over time or in response to external stresses are largely lacking.

To analyze the plasticity of genetic interactions, we performed a combinatorial RNAi screen in Drosophila cells at multiple time points and after pharmacological inhibition of Ras signaling activity. Using an image-based morphology assay to capture a broad range of phenotypes, we assessed the effect of 12768 pairwise RNAi perturbations in six different conditions. We found that genetic interactions form in different trajectories and developed an algorithm, termed MODIFI, to analyze how genetic interactions rewire over time. Using this framework, we identified more statistically significant interactions compared to end-point assays and further observed several examples of context-dependent crosstalk between signaling pathways such as an interaction between Ras and Rel which is dependent on MEK activity. Reference

Genome-wide quantification of the effects of DNA methylation on human gene regulation

Changes in DNA methylation are involved in development, disease, and the response to environmental conditions. However, not all regulatory elements are functionally methylation-dependent (MD).

Here, we report a method, mSTARR-seq, that assesses the causal effects of DNA methylation on regulatory activity at hundreds of thousands of fragments (millions of CpG sites) simultaneously. Using mSTARR-seq, we identify thousands of MD regulatory elements in the human genome. MD activity is partially predictable using sequence and chromatin state information, and distinct transcription factors are associated with higher activity in unmethylated versus methylated DNA. Further, pioneer TFs linked to higher activity in the methylated state appear to drive demethylation of experimentally methylated sites. Reference

RNA G-quadruplexes at upstream open reading frames cause DHX36- and DHX9-dependent translation of human mRNAs

RNA secondary structures in the 5′-untranslated regions (5′-UTR) of mRNAs are key to the post-transcriptional regulation of gene expression.

While it is evident that non-canonical Hoogsteen-paired G-quadruplex (rG4) structures somehow contribute to the regulation of translation initiation, the nature and extent of human mRNAs that are regulated by rG4s is not known. Here, we provide new insights into a mechanism by which rG4 formation modulates translation. Reference

A test metric for assessing single-cell RNA-seq batch correction

Single-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations, but as with all genomics experiments, batch effects can hamper data integration and interpretation.

The success of batch-effect correction is often evaluated by visual inspection of low-dimensional embeddings, which are inherently imprecise. Here we present a user-friendly, robust and sensitive k-nearest-neighbor batch-effect test (kBET) for quantification of batch effects. We used kBET to assess commonly used batch-regression and normalization approaches, and to quantify the extent to which they remove batch effects while preserving biological variability. Reference

Human genome-wide measurement of drug-responsive regulatory activity

Environmental stimuli commonly act via changes in gene regulation. Human-genome-scale assays to measure such responses are indirect or require knowledge of the transcription factors (TFs) involved.

Here, we present the use of human genome-wide high-throughput reporter assays to measure environmentally-responsive regulatory element activity. We focus on responses to glucocorticoids (GCs), an important class of pharmaceuticals and a paradigmatic genomic response model. We assay GC-responsive regulatory activity across >108 unique DNA fragments, covering the human genome at >50×. Those assays directly detected thousands of GC-responsive regulatory elements genome-wide. Reference

Genomic analysis identifies frequent deletions of Dystrophin in olfactory neuroblastoma

Olfactory neuroblastoma (ONB) is a rare malignant neoplasm arising in the upper portion of the sinonasal cavity. To better understand the genetic bases for ONB,

here we perform whole exome and whole genome sequencing as well as single nucleotide polymorphism array analyses in a series of ONB patient samples. Deletions involving the dystrophin (DMD) locus are found in 12 of 14 (86%) tumors. Interestingly, one of the remaining tumors has a deletion in LAMA2, bringing the number of ONBs with deletions of genes involved in the development of muscular dystrophies to 13 or 93%. Reference

Meta-analysis of Immunochip data of four autoimmune diseases

In recent years, research has consistently proven the occurrence of genetic overlap across autoimmune diseases, which supports the existence of common pathogenic mechanisms in autoimmunity. The objective of this study was to further investigate this shared genetic component.

For this purpose, we performed a cross-disease meta-analysis of Immunochip data from 37,159 patients diagnosed with a seropositive autoimmune disease (11,489 celiac disease (CeD), 15,523 rheumatoid arthritis (RA), 3477 systemic sclerosis (SSc), and 6670 type 1 diabetes (T1D)) and 22,308 healthy controls of European origin using the R package ASSET.  We identified 38 risk variants shared by at least two of the conditions analyzed, five of which represent new pleiotropic loci in autoimmunity. Reference

An integrative approach for building personalized gene regulatory networks for precision medicine

Only a small fraction of patients respond to the drug prescribed to treat their disease, which means that most are at risk of unnecessary exposure to side effects through ineffective drugs.

This inter-individual variation in drug response is driven by differences in gene interactions caused by each patient’s genetic background, environmental exposures, and the proportions of specific cell types involved in disease. These gene interactions can now be captured by building gene regulatory networks, by taking advantage of RNA velocity (the time derivative of the gene expression state), the ability to study hundreds of thousands of cells simultaneously, and the falling price of single-cell sequencing. Here, we propose an integrative approach that leverages these recent advances in single-cell data with the sensitivity of bulk data to enable the reconstruction of personalized, cell-type- and context-specific gene regulatory networks. Reference

Predicting age from the transcriptome of human dermal fibroblasts

Biomarkers of aging can be used to assess the health of individuals and to study aging and age-related diseases. We generate a large dataset of genome-wide RNA-seq profiles of human dermal fibroblasts from 133 people aged 1 to 94 years old to test whether signatures of aging are encoded within the transcriptome.

We develop an ensemble machine learning method that predicts age to a median error of 4 years, outperforming previous methods used to predict age. The ensemble was further validated by testing it on ten progeria patients, and our method is the only one that predicts accelerated aging in these patients. Reference

Local mutational diversity drives intratumoral immune heterogeneity in non-small cell lung cancer

Combining whole exome sequencing, transcriptome profiling, and T cell repertoire analysis, we investigate the spatial features of surgically-removed biopsies from multiple loci in tumor masses of 15 patients with non-small cell lung cancer (NSCLC).

This revealed that the immune microenvironment has high spatial heterogeneity such that intratumoral regional variation is as large as inter-personal variation. While the local total mutational burden (TMB) is associated with local T-cell clonal expansion, local anti-tumor cytotoxicity does not directly correlate with neoantigen abundance. Reference

FORGe: prioritizing variants for graph genomes

There is growing interest in using genetic variants to augment the reference genome into a graph genome, with alternative sequences, to improve read alignment accuracy and reduce allelic bias.

While adding a variant has the positive effect of removing an undesirable alignment score penalty, it also increases both the ambiguity of the reference genome and the cost of storing and querying the genome index. We introduce methods and a software tool called FORGe for modeling these effects and prioritizing variants accordingly. We show that FORGe enables a range of advantageous and measurable trade-offs between accuracy and computational overhead. Reference

Pan-cancer analysis of transcriptional metabolic dysregulation using The Cancer Genome Atlas

Understanding metabolic dysregulation in different disease settings is vital for the safe and effective incorporation of metabolism-targeted therapeutics in the clinic.

Here, using transcriptomic data for 10,704 tumor and normal samples from The Cancer Genome Atlas, across 26 disease sites, we present a novel bioinformatics pipeline that distinguishes tumor from normal tissues, based on differential gene expression for 114 metabolic pathways. We confirm pathway dysregulation in separate patient populations, demonstrating the robustness of our approach. Bootstrapping simulations were then applied to assess the biological significance of these alterations. We provide distinct examples of the types of analysis that can be accomplished with this tool to understand cancer specific metabolic dysregulation, highlighting novel pathways of interest, and patterns of metabolic flux, in both common and rare disease sites. Reference

Drug and disease signature integration identifies synergistic combinations in glioblastoma

Glioblastoma (GBM) is the most common primary adult brain tumor. Despite extensive efforts, the median survival for GBM patients is approximately 14 months. GBM therapy could benefit greatly from patient-specific targeted therapies that maximize treatment efficacy.

Here we report a platform termed SynergySeq to identify drug combinations for the treatment of GBM by integrating information from The Cancer Genome Atlas (TCGA) and the Library of Integrated Network-Based Cellular Signatures (LINCS). We identify differentially expressed genes in GBM samples and devise a consensus gene expression signature for each compound using LINCS L1000 transcriptional profiling data. Reference

A comprehensive pipeline for translational top-down proteomics from a single blood draw

Top-down proteomics (TDP) by mass spectrometry (MS) is a technique by which intact proteins are analyzed. It has become increasingly popDesalting and concentrating GELFrEEular in translational research because of the value of characterizing distinct proteoforms of intact proteins.

Compared to bottom-up proteomics (BUP) strategies, which measure digested peptide mixtures, TDP provides highly specific molecular information that avoids the bioinformatic challenge of protein inference. However, the technique has been difficult to implement widely because of inherent limitations of existing sample preparation methods and instrumentation. Reference