iGUIDE: an improved pipeline for analyzing CRISPR cleavage specificity
Genome engineering methods have advanced greatly with the development of programmable nucleases, but methods for quantifying on- and off-target cleavage sites and associated deletions remain nascent.
Here, we report an improvement of the GUIDE-seq method, iGUIDE, which allows filtering of mispriming events to clarify the true cleavage signal. Using iGUIDE, we specify the locations of Cas9-guided cleavage for four guide RNAs, characterize associated deletions, and show that naturally occurring background DNA double-strand breaks are associated with open chromatin, gene dense regions, and chromosomal fragile sites. Reference
An automated Bayesian pipeline for rapid analysis of single-molecule binding data
Single-molecule binding assays enable the study of how molecular machines assemble and function. Current algorithms can identify and locate individual molecules, but require tedious manual validation of each spot.
Moreover, no solution for high-throughput analysis of single-molecule binding data exists. Here, we describe an automated pipeline to analyze single-molecule data over a wide range of experimental conditions. In addition, our method enables state estimation on multivariate Gaussian signals. We validate our approach using simulated data, and benchmark the pipeline by measuring the binding properties of the well-studied, DNA-guided DNA endonuclease, TtAgo, an Argonaute protein from the Eubacterium Thermus thermophilus. Reference
Tumor mutational load predicts survival after immunotherapy across multiple cancer types
Immune checkpoint inhibitor (ICI) treatments benefit some patients with metastatic cancers, but predictive biomarkers are needed. Findings in selected cancer types suggest that tumor mutational burden (TMB) may predict clinical response to ICI.
To examine this association more broadly, we analyzed the clinical and genomic data of 1,662 advanced cancer patients treated with ICI, and 5,371 non-ICI-treated patients, whose tumors underwent targeted next-generation sequencing (MSK-IMPACT). Among all patients, higher somatic TMB (highest 20% in each histology) was associated with better overall survival. For most cancer histologies, an association between higher TMB and improved survival was observed. The TMB cutpoints associated with improved survival varied markedly between cancer types. Reference
A macrophage-based screen identifies antibacterial compounds selective for intracellular Salmonella Typhimurium
Salmonella Typhimurium (S. Tm) establishes systemic infection in susceptible hosts by evading the innate immune response and replicating within host phagocytes.
Here, we sought to identify inhibitors of intracellular S. Tm replication by conducting parallel chemical screens against S. Tm growing in macrophage-mimicking media and within macrophages. We identify several compounds that inhibit Salmonella growth in the intracellular environment and in acidic, ion-limited media. We report on the antimicrobial activity of the psychoactive drug metergoline, which is specific against intracellular S. Tm. Screening an S. Tm deletion library in the presence of metergoline reveals hypersensitization of outer membrane mutants to metergoline activity. Reference
A gene expression map of shoot domains reveals regulatory mechanisms
Gene regulatory networks control development via domain-specific gene expression. In seed plants, self-renewing stem cells located in the shoot apical meristem (SAM) produce leaves from the SAM peripheral zone. After initiation, leaves develop polarity patterns to form a planar shape.
Here we compare translating RNAs among SAM and leaf domains. Using translating ribosome affinity purification and RNA sequencing to quantify gene expression in target domains, we generate a domain-specific translatome map covering representative vegetative stage SAM and leaf domains. We discuss the predicted cellular functions of these domains and provide evidence that dome seemingly unrelated domains, utilize common regulatory modules. Experimental follow up shows that the RABBIT EARS and HANABA TARANU transcription factors have roles in axillary meristem initiation. Reference
High-performance medicine: the convergence of human and artificial intelligence
The use of artificial intelligence, and the deep-learning subtype in particular, has been enabled by the use of labeled big data, along with markedly enhanced computing power and cloud storage, across all sectors.
In medicine, this is beginning to have an impact at three levels: for clinicians, predominantly via rapid, accurate image interpretation; for health systems, by improving workflow and the potential for reducing medical errors; and for patients, by enabling them to process their own data to promote health. The current limitations, including bias, privacy and security, and lack of transparency, along with the future directions of these applications will be discussed in this article. Over time, marked improvements in accuracy, productivity, and workflow will likely be actualized, but whether that will be used to improve the patient–doctor relationship or facilitate its erosion remains to be seen. Reference
NG-TAS: an optimised protocol and computational pipeline for cost-effective profiling of circulating tumour DNA
Circulating tumour DNA (ctDNA) detection and monitoring have enormous potential clinical utility in oncology.
We describe here a fast, flexible and cost-effective method to profile multiple genes simultaneously in low input cell-free DNA (cfDNA): Next Generation-Targeted Amplicon Sequencing (NG-TAS). We designed a panel of 377 amplicons spanning 20 cancer genes and tested the NG-TAS pipeline using cell-free DNA from two HapMap lymphoblastoid cell lines. NG-TAS consistently detected mutations in cfDNA when mutation allele fraction was > 1%. We applied NG-TAS to a clinical cohort of metastatic breast cancer patients, demonstrating its potential in monitoring the disease. Reference
Accurate prediction of cell type-specific transcription factor binding
Prediction of cell type-specific, in vivo transcription factor binding sites is one of the central challenges in regulatory genomics. Here, we present our approach that earned a shared first rank in the “ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge” in 2017.
In post-challenge analyses, we benchmark the influence of different feature sets and find that chromatin accessibility and binding motifs are sufficient to yield state-of-the-art performance. Finally, we provide 682 lists of predicted peaks for a total of 31 transcription factors in 22 primary cell types and tissues and a user-friendly version of our approach, Catchitt, for download. Reference
Genome-wide profiling of adenine base editor specificity by EndoV-seq
The adenine base editor (ABE), capable of catalyzing A•T to G•C conversions, is an important gene editing toolbox. Here, we systematically evaluate genome-wide off-target deamination by ABEs using the EndoV-seq platform we developed.
EndoV-seq utilizes Endonuclease V to nick the inosine-containing DNA strand of genomic DNA deaminated by ABE in vitro. The treated DNA is then whole-genome sequenced to identify off-target sites. Of the eight gRNAs we tested with ABE, 2–19 (with an average of 8.0) off-target sites are found, significantly fewer than those found for canonical Cas9 nuclease (7–320, 160.7 on average). In vivo off-target deamination is further validated through target site deep sequencing. Moreover, we demonstrated that six different ABE-gRNA complexes could be examined in a single EndoV-seq assay. Reference
An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar
How viruses evolve within hosts can dictate infection outcomes; however, reconstructing this process is challenging. We evaluate our multiplexed amplicon approach, PrimalSeq, to demonstrate how virus concentration, sequencing coverage, primer mismatches, and replicates influence the accuracy of measuring intrahost virus diversity.
We develop an experimental protocol and computational tool, iVar, for using PrimalSeq to measure virus diversity using Illumina and compare the results to Oxford Nanopore sequencing. We demonstrate the utility of PrimalSeq by measuring Zika and West Nile virus diversity from varied sample types and show that the accumulation of genetic diversity is influenced by experimental and biological systems. Reference
The Network of Cancer Genes (NCG): a comprehensive catalogue of known and candidate cancer genes from cancer sequencing screens
The Network of Cancer Genes (NCG) is a manually curated repository of 2372 genes whose somatic modifications have known or predicted cancer driver roles.
These genes were collected from 275 publications, including two sources of known cancer genes and 273 cancer sequencing screens of more than 100 cancer types from 34,905 cancer donors and multiple primary sites. This represents a more than 1.5-fold content increase compared to the previous version. NCG also annotates properties of cancer genes, such as duplicability, evolutionary origin, RNA and protein expression, miRNA and protein interactions, and protein function and essentiality. Reference
Cell-type-specific metabolic labeling, detection and identification of nascent proteomes in vivo
A big challenge in proteomics is the identification of cell-type-specific proteomes in vivo. This protocol describes how to label, purify and identify cell-type-specific proteomes in living mice.
To make this possible, we created a Cre-recombinase-inducible mouse line expressing a mutant methionyl-tRNA synthetase (L274G), which enables the labeling of nascent proteins with the non-canonical amino acid azidonorleucine (ANL). This amino acid can be conjugated to different affinity tags by click chemistry. After affinity purification (AP), the labeled proteins can be identified by tandem mass spectrometry (MS/MS). With this method, it is possible to identify cell-type-specific proteomes derived from living animals, which was not possible with any previously published method. Reference
Age-related remodelling of oesophageal epithelia by mutated cancer drivers
Clonal expansion in aged normal tissues has been implicated in the development of cancer. However, the chronology and risk dependence of the expansion are poorly understood.
Here we intensively sequence 682 micro-scale oesophageal samples and show, in physiologically normal oesophageal epithelia, the progressive age-related expansion of clones that carry mutations in driver genes (predominantly NOTCH1), which is substantially accelerated by alcohol consumption and by smoking. Driver-mutated clones emerge multifocally from early childhood and increase their number and size with ageing, and ultimately replace almost the entire oesophageal epithelium in the extremely elderly. Reference
Circadian oscillations of cytosine modification in humans contribute to epigenetic variability, aging, and complex disease
Maintenance of physiological circadian rhythm plays a crucial role in human health. Numerous studies have shown that disruption of circadian rhythm may increase risk for malignant, psychiatric, metabolic, and other diseases.
Extending our recent findings of oscillating cytosine modifications (osc-modCs) in mice, in this study, we show that osc-modCs are also prevalent in human neutrophils. Osc-modCs may play a role in gene regulation, can explain parts of intra- and inter-individual epigenetic variation, and are signatures of aging. Finally, we show that osc-modCs are linked to three complex diseases and provide a new interpretation of cross-sectional epigenome-wide association studies. Reference
Integration of DNA methylation patterns and genetic variation in human pediatric tissues
The widespread use of accessible peripheral tissues for epigenetic analyses has prompted increasing interest in the study of tissue-specific DNA methylation (DNAm) variation in human populations.
To date, characterizations of inter-individual DNAm variability and DNAm concordance across tissues have been largely performed in adult tissues and therefore are limited in their relevance to DNAm profiles from pediatric samples. BECs had greater inter-individual DNAm variability compared to PBMCs and highly the variable CpGs are more likely to be positively correlated between the matched tissues compared to less variable CpGs. These sites were enriched for CpGs under genetic influence, suggesting that a substantial proportion of DNAm covariation between tissues can be attributed to genetic variation. Finally, we demonstrated the relevance of our findings to human epigenetic studies by categorizing CpGs from published DNAm association studies of pediatric BECs and peripheral blood. Reference
FKL-Spa-LapRLS: an accurate method for identifying human microRNA-disease association
In the process of post-transcription, microRNAs (miRNAs) are closely related to various complex human diseases. Traditional verification methods for miRNA-disease associations take a lot of time and expense, so it is especially important to design computational methods for detecting potential associations.
Considering the restrictions of previous computational methods for predicting potential miRNAs-disease associations, we develop the model of FKL-Spa-LapRLS (Fast Kernel Learning Sparse kernel Laplacian Regularized Least Squares) to break through the limitations. Reference
Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes
Most eukaryotic genes comprise exons and introns thus requiring the precise removal of introns from pre-mRNAs to enable protein biosynthesis. U2 and U12 spliceosomes catalyze this step by recognizing motifs on the transcript in order to remove the introns. A process which is dependent on precise definition of exon-intron borders by splice sites, which are consequently highly conserved across species.
Only very few combinations of terminal dinucleotides are frequently observed at intron ends, dominated by the canonical GT-AG splice sites on the DNA level. Here we investigate the occurrence of diverse combinations of dinucleotides at predicted splice sites. Analyzing 121 plant genome sequences based on their annotation revealed strong splice site conservation across species, annotation errors, and true biological divergence from canonical splice sites. Reference
Time-resolved mapping of genetic interactions to model rewiring of signaling pathways
Context-dependent changes in genetic interactions are an important feature of cellular pathways and their varying responses under different environmental conditions. However, methodological frameworks to investigate the plasticity of genetic interaction networks over time or in response to external stresses are largely lacking.
To analyze the plasticity of genetic interactions, we performed a combinatorial RNAi screen in Drosophila cells at multiple time points and after pharmacological inhibition of Ras signaling activity. Using an image-based morphology assay to capture a broad range of phenotypes, we assessed the effect of 12768 pairwise RNAi perturbations in six different conditions. We found that genetic interactions form in different trajectories and developed an algorithm, termed MODIFI, to analyze how genetic interactions rewire over time. Using this framework, we identified more statistically significant interactions compared to end-point assays and further observed several examples of context-dependent crosstalk between signaling pathways such as an interaction between Ras and Rel which is dependent on MEK activity. Reference
Genome-wide quantification of the effects of DNA methylation on human gene regulation
Changes in DNA methylation are involved in development, disease, and the response to environmental conditions. However, not all regulatory elements are functionally methylation-dependent (MD).
Here, we report a method, mSTARR-seq, that assesses the causal effects of DNA methylation on regulatory activity at hundreds of thousands of fragments (millions of CpG sites) simultaneously. Using mSTARR-seq, we identify thousands of MD regulatory elements in the human genome. MD activity is partially predictable using sequence and chromatin state information, and distinct transcription factors are associated with higher activity in unmethylated versus methylated DNA. Further, pioneer TFs linked to higher activity in the methylated state appear to drive demethylation of experimentally methylated sites. Reference
RNA G-quadruplexes at upstream open reading frames cause DHX36- and DHX9-dependent translation of human mRNAs
RNA secondary structures in the 5′-untranslated regions (5′-UTR) of mRNAs are key to the post-transcriptional regulation of gene expression.
While it is evident that non-canonical Hoogsteen-paired G-quadruplex (rG4) structures somehow contribute to the regulation of translation initiation, the nature and extent of human mRNAs that are regulated by rG4s is not known. Here, we provide new insights into a mechanism by which rG4 formation modulates translation. Reference
A test metric for assessing single-cell RNA-seq batch correction
Single-cell transcriptomics is a versatile tool for exploring heterogeneous cell populations, but as with all genomics experiments, batch effects can hamper data integration and interpretation.
The success of batch-effect correction is often evaluated by visual inspection of low-dimensional embeddings, which are inherently imprecise. Here we present a user-friendly, robust and sensitive k-nearest-neighbor batch-effect test (kBET) for quantification of batch effects. We used kBET to assess commonly used batch-regression and normalization approaches, and to quantify the extent to which they remove batch effects while preserving biological variability. Reference
Human genome-wide measurement of drug-responsive regulatory activity
Environmental stimuli commonly act via changes in gene regulation. Human-genome-scale assays to measure such responses are indirect or require knowledge of the transcription factors (TFs) involved.
Here, we present the use of human genome-wide high-throughput reporter assays to measure environmentally-responsive regulatory element activity. We focus on responses to glucocorticoids (GCs), an important class of pharmaceuticals and a paradigmatic genomic response model. We assay GC-responsive regulatory activity across >108 unique DNA fragments, covering the human genome at >50×. Those assays directly detected thousands of GC-responsive regulatory elements genome-wide. Reference
Genomic analysis identifies frequent deletions of Dystrophin in olfactory neuroblastoma
Olfactory neuroblastoma (ONB) is a rare malignant neoplasm arising in the upper portion of the sinonasal cavity. To better understand the genetic bases for ONB,
here we perform whole exome and whole genome sequencing as well as single nucleotide polymorphism array analyses in a series of ONB patient samples. Deletions involving the dystrophin (DMD) locus are found in 12 of 14 (86%) tumors. Interestingly, one of the remaining tumors has a deletion in LAMA2, bringing the number of ONBs with deletions of genes involved in the development of muscular dystrophies to 13 or 93%. Reference
Meta-analysis of Immunochip data of four autoimmune diseases
In recent years, research has consistently proven the occurrence of genetic overlap across autoimmune diseases, which supports the existence of common pathogenic mechanisms in autoimmunity. The objective of this study was to further investigate this shared genetic component.
For this purpose, we performed a cross-disease meta-analysis of Immunochip data from 37,159 patients diagnosed with a seropositive autoimmune disease (11,489 celiac disease (CeD), 15,523 rheumatoid arthritis (RA), 3477 systemic sclerosis (SSc), and 6670 type 1 diabetes (T1D)) and 22,308 healthy controls of European origin using the R package ASSET. We identified 38 risk variants shared by at least two of the conditions analyzed, five of which represent new pleiotropic loci in autoimmunity. Reference
An integrative approach for building personalized gene regulatory networks for precision medicine
Only a small fraction of patients respond to the drug prescribed to treat their disease, which means that most are at risk of unnecessary exposure to side effects through ineffective drugs.
This inter-individual variation in drug response is driven by differences in gene interactions caused by each patient’s genetic background, environmental exposures, and the proportions of specific cell types involved in disease. These gene interactions can now be captured by building gene regulatory networks, by taking advantage of RNA velocity (the time derivative of the gene expression state), the ability to study hundreds of thousands of cells simultaneously, and the falling price of single-cell sequencing. Here, we propose an integrative approach that leverages these recent advances in single-cell data with the sensitivity of bulk data to enable the reconstruction of personalized, cell-type- and context-specific gene regulatory networks. Reference
Predicting age from the transcriptome of human dermal fibroblasts
Biomarkers of aging can be used to assess the health of individuals and to study aging and age-related diseases. We generate a large dataset of genome-wide RNA-seq profiles of human dermal fibroblasts from 133 people aged 1 to 94 years old to test whether signatures of aging are encoded within the transcriptome.
We develop an ensemble machine learning method that predicts age to a median error of 4 years, outperforming previous methods used to predict age. The ensemble was further validated by testing it on ten progeria patients, and our method is the only one that predicts accelerated aging in these patients. Reference
Local mutational diversity drives intratumoral immune heterogeneity in non-small cell lung cancer
Combining whole exome sequencing, transcriptome profiling, and T cell repertoire analysis, we investigate the spatial features of surgically-removed biopsies from multiple loci in tumor masses of 15 patients with non-small cell lung cancer (NSCLC).
This revealed that the immune microenvironment has high spatial heterogeneity such that intratumoral regional variation is as large as inter-personal variation. While the local total mutational burden (TMB) is associated with local T-cell clonal expansion, local anti-tumor cytotoxicity does not directly correlate with neoantigen abundance. Reference
FORGe: prioritizing variants for graph genomes
There is growing interest in using genetic variants to augment the reference genome into a graph genome, with alternative sequences, to improve read alignment accuracy and reduce allelic bias.
While adding a variant has the positive effect of removing an undesirable alignment score penalty, it also increases both the ambiguity of the reference genome and the cost of storing and querying the genome index. We introduce methods and a software tool called FORGe for modeling these effects and prioritizing variants accordingly. We show that FORGe enables a range of advantageous and measurable trade-offs between accuracy and computational overhead. Reference
Pan-cancer analysis of transcriptional metabolic dysregulation using The Cancer Genome Atlas
Understanding metabolic dysregulation in different disease settings is vital for the safe and effective incorporation of metabolism-targeted therapeutics in the clinic.
Here, using transcriptomic data for 10,704 tumor and normal samples from The Cancer Genome Atlas, across 26 disease sites, we present a novel bioinformatics pipeline that distinguishes tumor from normal tissues, based on differential gene expression for 114 metabolic pathways. We confirm pathway dysregulation in separate patient populations, demonstrating the robustness of our approach. Bootstrapping simulations were then applied to assess the biological significance of these alterations. We provide distinct examples of the types of analysis that can be accomplished with this tool to understand cancer specific metabolic dysregulation, highlighting novel pathways of interest, and patterns of metabolic flux, in both common and rare disease sites. Reference
Drug and disease signature integration identifies synergistic combinations in glioblastoma
Glioblastoma (GBM) is the most common primary adult brain tumor. Despite extensive efforts, the median survival for GBM patients is approximately 14 months. GBM therapy could benefit greatly from patient-specific targeted therapies that maximize treatment efficacy.
Here we report a platform termed SynergySeq to identify drug combinations for the treatment of GBM by integrating information from The Cancer Genome Atlas (TCGA) and the Library of Integrated Network-Based Cellular Signatures (LINCS). We identify differentially expressed genes in GBM samples and devise a consensus gene expression signature for each compound using LINCS L1000 transcriptional profiling data. Reference
A comprehensive pipeline for translational top-down proteomics from a single blood draw
Top-down proteomics (TDP) by mass spectrometry (MS) is a technique by which intact proteins are analyzed. It has become increasingly popDesalting and concentrating GELFrEEular in translational research because of the value of characterizing distinct proteoforms of intact proteins.
Compared to bottom-up proteomics (BUP) strategies, which measure digested peptide mixtures, TDP provides highly specific molecular information that avoids the bioinformatic challenge of protein inference. However, the technique has been difficult to implement widely because of inherent limitations of existing sample preparation methods and instrumentation. Reference