Home

 Science in this Week (May, 2018)

Update on: May 25, 2018


Integrated time course omics analysis distinguishes immediate therapeutic response from acquired resistance

Targeted therapies specifically act by blocking the activity of proteins that are encoded by genes critical for tumorigenesis. However, most cancers acquire resistance and long-term disease remission is rarely observed.

Understanding the time course of molecular changes responsible for the development of acquired resistance could enable optimization of patients’ treatment options. Clinically, acquired therapeutic resistance can only be studied at a single time point in resistant tumors.   To determine the dynamics of these molecular changes, we obtained high throughput omics data (RNA-sequencing and DNA methylation) weekly during the development of cetuximab resistance in a head and neck cancer in vitro model. The CoGAPS unsupervised algorithm was used to determine the dynamics of the molecular changes associated with resistance during the time course of resistance development. Reference


Mapping the physical network of cellular interactions

A cell’s function is influenced by the environment, or niche, in which it resides. Studies of niches usually require assumptions about the cell types present, which impedes the discovery of new cell types or interactions.

Here we describe ProximID, an approach for building a cellular network based on physical cell interaction and single-cell mRNA sequencing, and show that it can be used to discover new preferential cellular interactions without prior knowledge of component cell types. ProximID found specific interactions between megakaryocytes and mature neutrophils and between plasma cells and myeloblasts and/or promyelocytes (precursors of neutrophils) in mouse bone marrow, and it identified a Tac1+ enteroendocrine cell–Lgr5+ stem cell interaction in small intestine crypts. Reference


Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease

Our understanding of kidney disease pathogenesis is limited by an incomplete molecular characterization of the cell types responsible for the organ’s multiple homeostatic functions. To help fill this knowledge gap, we characterized 57,979 cells from healthy mouse kidneys using unbiased single-cell RNA sequencing.

Based on gene expression patterns, we infer that inherited kidney diseases that arise from distinct genetic mutations but share the same phenotypic manifestation originate from the same cell differentiated type. We also found that the kidney collecting duct in adult mice generates a spectrum of cell types via a newly identified transitional cell. Computational cell trajectory analysis and in vivo lineage tracing revealed that intercalated cells and principal cells undergo transitions mediated by the Notch signaling pathway. Reference


A protein activity assay to measure global transcription factor activity reveals determinants of chromatin accessibility

No existing method to characterize transcription factor (TF) binding to DNA allows genome-wide measurement of all TF-binding activity in cells.

Here we present a massively parallel protein activity assay, active TF identification (ATI), that measures the DNA-binding activity of all TFs in cell or tissue extracts. ATI is based on electrophoretic separation of protein-bound DNA sequences from a highly complex DNA library and subsequent mass-spectrometric identification of the DNA-bound proteins. We applied ATI to four mouse tissues and mouse embryonic stem cells and found that, in a given tissue or cell type, a small set of TFs, which bound to only ∼10 distinct motifs, displayed strong DNA-binding activity. Reference


Phenotypic diversification by enhanced genome restructuring

DNA double-strand break (DSB)-mediated genome rearrangements are assumed to provide diverse raw genetic materials enabling accelerated adaptive evolution; however, it remains unclear about the consequences of massive simultaneous DSB formation in cells and their resulting phenotypic impact.

Here, we establish an artificial genome-restructuring technology by conditionally introducing multiple genomic DSBs in vivo using a temperature-dependent endonuclease TaqI. Application in yeast and Arabidopsis thaliana generates strains with phenotypes, including improved ethanol production from xylose at higher temperature and increased plant biomass, that are stably inherited to offspring after multiple passages. Reference


CoNVaQ: a web tool for copy number variation-based association studies

Copy number variations (CNVs) are large segments of the genome that are duplicated or deleted. Structural variations in the genome have been linked to many complex diseases.

Similar to how genome-wide association studies (GWAS) have helped discover single-nucleotide polymorphisms linked to disease phenotypes, the extension of GWAS to CNVs has aided the discovery of structural variants associated with human traits and diseases.   We present CoNVaQ, an easy-to-use web-based tool for CNV-based association studies. The web service allows users to upload two sets of CNV segments and search for genomic regions where the occurrence of CNVs is significantly associated with the phenotype. Reference


Distinct epigenetic landscapes underlie the pathobiology of pancreatic cancer subtypes

Recent studies have offered ample insight into genome-wide expression patterns to define pancreatic ductal adenocarcinoma (PDAC) subtypes, although there remains a lack of knowledge regarding the underlying epigenomics of PDAC.

Here we perform multi-parametric integrative analyses of chromatin immunoprecipitation-sequencing (ChIP-seq) on multiple histone modifications, RNA-sequencing (RNA-seq), and DNA methylation to define epigenomic landscapes for PDAC subtypes, which can predict their relative aggressiveness and survival. Moreover, we describe the state of promoters, enhancers, super-enhancers, euchromatic, and heterochromatic regions for each subtype. Reference


Visualizing histopathologic deep learning classification

There is growing interest in utilizing artificial intelligence, and particularly deep learning, for computer vision in histopathology.

While accumulating studies highlight expert-level performance of convolutional neural networks (CNNs) on focused classification tasks, most studies rely on probability distribution scores with empirically defined cutoff values based on post-hoc analysis. More generalizable tools that allow humans to visualize histology-based deep learning inferences and decision making are scarce. Reference


Assessment of established techniques to determine developmental and malignant potential of human pluripotent stem cells

The International Stem Cell Initiative compared several commonly used approaches to assess human pluripotent stem cells (PSC). PluriTest predicts pluripotency through bioinformatic analysis of the transcriptomes of undifferentiated cells, whereas, embryoid body (EB) formation in vitro and teratoma formation in vivo provide direct tests of differentiation.

Here we report that EB assays, analyzed after differentiation under neutral conditions and under conditions promoting differentiation to ectoderm, mesoderm, or endoderm lineages, are sufficient to assess the differentiation potential of PSCs. However, teratoma analysis by histologic examination and by TeratoScore, which estimates differential gene expression in each tumor, not only measures differentiation but also allows insight into a PSC’s malignant potential. Reference


Porcupine: A visual pipeline tool for neuroimaging analysis

The field of neuroimaging is rapidly adopting a more reproducible approach to data acquisition and analysis. Data structures and formats are being standardised and data analyses are getting more automated.

However, as data analysis becomes more complicated, researchers often have to write longer analysis scripts, spanning different tools across multiple programming languages. This makes it more difficult to share or recreate code, reducing the reproducibility of the analysis. We present a tool, Porcupine, that constructs one’s analysis visually and automatically produces analysis code. The graphical representation improves understanding of the performed analysis, while retaining the flexibility of modifying the produced code manually to custom needs. . Reference


GiniClust2: a cluster-aware, weighted ensemble clustering method for cell-type detection

Single-cell analysis is a powerful tool for dissecting the cellular composition within a tissue or organ. However, it remains difficult to detect rare and common cell types at the same time.

Here, we present a new computational method, GiniClust2, to overcome this challenge. GiniClust2 combines the strengths of two complementary approaches, using the Gini index and Fano factor, respectively, through a cluster-aware, weighted ensemble clustering technique. GiniClust2 successfully identifies both common and rare cell types in diverse datasets, outperforming existing methods. GiniClust2 is scalable to large datasets. Reference


Regulatory protein SrpA controls phage infection and core cellular processes in Pseudomonas aeruginosa

Our understanding of the molecular mechanisms behind bacteria-phage interactions remains limited. Here we report that a small protein, SrpA, controls core cellular processes in response to phage infection and environmental signals in Pseudomonas aeruginosa.

We show that SrpA is essential for efficient genome replication of phage K5, and controls transcription by binding to a palindromic sequence upstream of the phage RNA polymerase gene. We identify potential SrpA-binding sites in 66 promoter regions across the P. aeruginosa genome, and experimentally validate direct binding of SrpA to some of these sites. Using transcriptomics and further experiments, we show that SrpA, directly or indirectly, regulates many cellular processes including cell motility, chemotaxis, biofilm formation, pyocyanin synthesis and protein secretion, as well as virulence in a Caenorhabditis elegans model of infection. Reference


Genome sequence of the progenitor of wheat A subgenome Triticum urartu

Triticum urartu (diploid, AA) is the progenitor of the A subgenome of tetraploid (Triticum turgidum, AABB) and hexaploid (Triticum aestivum, AABBDD) wheat1,2. Genomic studies of T. urartu have been useful for investigating the structure, function and evolution of polyploid wheat genomes.

Here we report the generation of a high-quality genome sequence of T. urartu by combining bacterial artificial chromosome (BAC)-by-BAC sequencing, single molecule real-time whole-genome shotgun sequencing3, linked reads and optical mapping4,5. We assembled seven chromosome-scale pseudomolecules and identified protein-coding genes, and we suggest a model for the evolution of T. urartu chromosomes. Reference


Genome-wide analysis of differentially expressed profiles of mRNAs, lncRNAs and circRNAs during Cryptosporidium baileyi infection

Cryptosporidium baileyi is the most common Cryptosporidium species in birds. However, effective prevention measures and treatment for C. baileyi infection were still not available.

Long non-coding RNAs (lncRNAs) and circular RNAs (circRNAs) play important roles in regulating occurrence and progression of many diseases and are identified as effective biomarkers for diagnosis and prognosis of several diseases. In the present study, the expression profiles of host mRNAs, lncRNAs and circRNAs associated with C. baileyi infection were investigated for the first time. Reference


Networks of genetic similarity reveal non-neutral processes shape strain structure in Plasmodium falciparum

Pathogens compete for hosts through patterns of cross-protection conferred by immune responses to antigens. In Plasmodium falciparum malaria, the var multigene family encoding for the major blood-stage antigen PfEMP1 has evolved enormous genetic diversity through ectopic recombination and mutation.

With 50–60 var genes per genome, it is unclear whether immune selection can act as a dominant force in structuring var repertoires of local populations. The combinatorial complexity of the var system remains beyond the reach of existing strain theory and previous evidence for non-random structure cannot demonstrate immune selection without comparison with neutral models. We develop two neutral models that encompass malaria epidemiology but exclude competitive interactions between parasites. Reference


Chemistry-First Approach for Nomination of Personalized Treatment in Lung Cancer

Diversity in the genetic lesions that cause cancer is extreme. In consequence, a pressing challenge is the development of drugs that target patient-specific disease mechanisms. To address this challenge, we employed a chemistry-first discovery paradigm for de novo identification of druggable targets linked to robust patient selection hypotheses.

In particular, a 200,000 compound diversity-oriented chemical library was profiled across a heavily annotated test-bed of >100 cellular models representative of the diverse and characteristic somatic lesions for lung cancer. This approach led to the delineation of 171 chemical-genetic associations, shedding light on the targetability of mechanistic vulnerabilities corresponding to a range of oncogenotypes present in patient populations lacking effective therapy. Chemically addressable addictions to ciliogenesis in TTC21B mutants and GLUT8-dependent serine biosynthesis in KRAS/KEAP1 double mutants are prominent examples. These observations indicate a wealth of actionable opportunities within the complex molecular etiology of cancer. Reference


Integrated molecular subtyping defines a curable oligometastatic state in colorectal liver metastasis

The oligometastasis hypothesis suggests a spectrum of metastatic virulence where some metastases are limited in extent and curable with focal therapies. A subset of patients with metastatic colorectal cancer achieves prolonged survival after resection of liver metastases consistent with oligometastasis.

Here we define three robust subtypes of de novo colorectal liver metastasis through integrative molecular analysis. Patients with metastases exhibiting MSI-independent immune activation experience the most favorable survival. Subtypes with adverse outcomes demonstrate VEGFA amplification in concert with (i) stromal, mesenchymal, and angiogenic signatures, or (ii) exclusive NOTCH1 and PIK3C2B mutations with E2F/MYC activation. Molecular subtypes complement clinical risk stratification to distinguish low-risk, intermediate-risk, and high-risk patients with 10-year overall survivals of 94%, 45%, and 19%, respectively. Reference


Evaluation of commercially available small RNASeq library preparation kits using low input RNA

Evolving interest in comprehensively profiling the full range of small RNAs present in small tissue biopsies and in circulating biofluids, and how the profile differs with disease, has launched small RNA sequencing (RNASeq) into more frequent use.

However, known biases associated with small RNASeq, compounded by low RNA inputs, have been both a significant concern and a hurdle to widespread adoption. In this paper, we further tested the performance of low RNA input in three commonly used and commercially available RNASeq library preparation kits; NEB Next, NEXTFlex, and TruSeq small RNA library preparation. We evaluated the performance of the kits at two different sites, using three different tissues (brain, liver, and placenta) with high (1 μg) and low RNA (10 ng) input from tissue samples, or 5.0, 3.0, 2.0, 1.0, 0.5, and 0.2 ml starting volumes of plasma.  Reference


FOCS: a novel method for analyzing enhancer and gene activity patterns infers an extensive enhancer–promoter map

Recent sequencing technologies enable joint quantification of promoters and their enhancer regions, allowing inference of enhancer–promoter links.

We show that current enhancer–promoter inference methods produce a high rate of false positive links. We introduce FOCS, a new inference method, and by benchmarking against ChIA-PET, HiChIP, and eQTL data show that it results in lower false discovery rates and at the same time higher inference power. By applying FOCS to 2630 samples taken from ENCODE, Roadmap Epigenomics, FANTOM5, and a new compendium of GRO-seq samples, we provide extensive enhancer–promotor maps. Reference


Gut microbiomes of wild great apes fluctuate seasonally in response to diet

The microbiome is essential for extraction of energy and nutrition from plant-based diets and may have facilitated primate adaptation to new dietary niches in response to rapid environmental shifts.

Here we use 16S rRNA sequencing to characterize the microbiota of wild western lowland gorillas and sympatric central chimpanzees and demonstrate compositional divergence between the microbiotas of gorillas, chimpanzees, Old World monkeys, and modern humans. We show that gorilla and chimpanzee microbiomes fluctuate with seasonal rainfall patterns and frugivory. Metagenomic sequencing of gorilla microbiomes demonstrates distinctions in functional metabolic pathways, archaea, and dietary plants among enterotypes, suggesting that dietary seasonality dictates shifts in the microbiome and its capacity for microbial plant fiber digestion versus growth on mucus glycans. Reference


The Rosa genome provides new insights into the domestication of modern roses

Roses have high cultural and economic importance as ornamental plants and in the perfume industry. We report the rose whole-genome sequencing and assembly and resequencing of major genotypes that contributed to rose domestication.

We generated a homozygous genotype from a heterozygous diploid modern rose progenitor, Rosa chinensis ‘Old Blush’. Using single-molecule real-time sequencing and a meta-assembly approach, we obtained one of the most comprehensive plant genomes to date. Diversity analyses highlighted the mosaic origin of ‘La France’, one of the first hybrids combining the growth vigor of European species and the recurrent blooming of Chinese species. Genomic segments of Chinese ancestry identified new candidate genes for recurrent blooming. Reference


Accurate detection of complex structural variations using single-molecule sequencing

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations.

Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment  and structural variant identification  that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. Reference


The genetic architecture of floral traits in the woody plant Prunus mume

Mei (Prunus mume) is an ornamental woody plant that has been domesticated in East Asia for thousands of years. High diversity in floral traits, along with its recent genome sequence, makes mei an ideal model system for studying the evolution of woody plants.

Here, we investigate the genetic architecture of floral traits in mei and its domestication history by sampling and resequencing a total of 351 samples including 348 mei accessions and three other Prunus species at an average sequencing depth of 19.3×. Highly-admixed population structure and introgression from Prunus species are identified in mei accessions. Reference


Histone H4 acetylation regulates behavioral inter-individual variability in zebrafish

Animals can show very different behaviors even in isogenic populations, but the underlying mechanisms to generate this variability remain elusive. We use the zebrafish (Danio rerio) as a model to test the influence of histone modifications on behavior.

We find that laboratory and isogenic zebrafish larvae show consistent individual behaviors when swimming freely in identical wells or in reaction to stimuli. This behavioral inter-individual variability is reduced when we impair the histone deacetylation pathway. Individuals with high levels of histone H4 acetylation, and specifically H4K12, behave similarly to the average of the population, but those with low levels deviate from it. Reference


OCEAN-C: mapping hubs of open chromatin interactions across the genome reveals gene regulatory networks

We develop a method called open chromatin enrichment and network Hi-C (OCEAN-C) for antibody-independent mapping of global open chromatin interactions. By integrating FAIRE-seq and Hi-C, OCEAN-C detects open chromatin interactions enriched by active cis-regulatory elements.

We identify more than 10,000 hubs of open chromatin interactions (HOCIs) in human cells, which are mainly active promoters and enhancers bound by many DNA-binding proteins and form interaction networks crucial for gene transcription. In addition to identifying large-scale topological structures, including topologically associated domains and A/B compartments, OCEAN-C can detect HOCI-mediated chromatin interactions that are strongly associated with gene expression, super-enhancers, and broad H3K4me3 domains. Reference


MutationalPatterns: comprehensive genome-wide analysis of mutational processes

Base substitution catalogues represent historical records of mutational processes that have been active in a cell. Such processes can be distinguished by various characteristics, like mutation type, sequence context, transcriptional and replicative strand bias, genomic distribution and association with (epi)-genomic features.

We have created MutationalPatterns, an R/Bioconductor package that allows researchers to characterize a broad range of patterns in base substitution catalogues to dissect the underlying molecular mechanisms. Furthermore, it offers an efficient method to quantify the contribution of known mutational signatures within single samples. Reference


Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma

To understand how genomic heterogeneity of glioblastoma (GBM) contributes to poor therapy response, we performed DNA and RNA sequencing on GBM samples and the neurospheres and orthotopic xenograft models derived from them.

We used the resulting dataset to show that somatic driver alterations including single-nucleotide variants, focal DNA alterations and oncogene amplification on extrachromosomal DNA (ecDNA) elements were in majority propagated from tumor to model systems. In several instances, ecDNAs and chromosomal alterations demonstrated divergent inheritance patterns and clonal selection dynamics during cell culture and xenografting. We infer that ecDNA was unevenly inherited by offspring cells, a characteristic that affects the oncogenic potential of cells with more or fewer ecDNAs. Reference


Chromatin conformation regulates the coordination between DNA replication and transcription

Chromatin is the template for the basic processes of replication and transcription, making the maintenance of chromosomal integrity critical for cell viability. To elucidate how dividing cells respond to alterations in chromatin structure, here we analyse the replication programme of primary cells with altered chromatin configuration caused by the genetic ablation of the HMGB1 gene, or three histone H1 genes.

We find that loss of chromatin compaction in H1-depleted cells triggers the accumulation of stalled forks and DNA damage as a consequence of transcription–replication conflicts. In contrast, reductions in nucleosome occupancy due to the lack of HMGB1 cause faster fork progression without impacting the initiation landscape or fork stability. Thus, perturbations in chromatin integrity elicit a range of responses in the dynamics of DNA replication and transcription, with different consequences on replicative stress. Reference


Quantitative diffusion measurements using the open-source software PyFRAP

Fluorescence Recovery After Photobleaching (FRAP) and inverse FRAP (iFRAP) assays can be used to assess the mobility of fluorescent molecules.

These assays measure diffusion by monitoring the return of fluorescence in bleached regions (FRAP), or the dissipation of fluorescence from photoconverted regions (iFRAP). However, current FRAP/iFRAP analysis methods suffer from simplified assumptions about sample geometry, bleaching/photoconversion inhomogeneities, and the underlying reaction-diffusion kinetics. To address these shortcomings, we developed the software PyFRAP, which fits numerical simulations of three-dimensional models to FRAP/iFRAP data and accounts for bleaching/photoconversion inhomogeneities. Using PyFRAP we determined the diffusivities of fluorescent molecules spanning two orders of magnitude in molecular weight. Reference


Enabling multiplexed testing of pooled donor cells through whole-genome sequencing

We describe a method that enables the multiplex screening of a pool of many different donor cell lines.

Our method accurately predicts each donor proportion from the pool without requiring the use of unique DNA barcodes as markers of donor identity. Instead, we take advantage of common single nucleotide polymorphisms, whole-genome sequencing, and an algorithm to calculate the proportions from the sequencing data. By testing using simulated and real data, we showed that our method robustly predicts the individual proportions from a mixed-pool of numerous donors, thus enabling the multiplexed testing of diverse donor cells en masse. Reference


Sequencing of prostate cancers identifies new cancer genes, routes of progression and drug targets

Prostate cancer represents a substantial clinical challenge because it is difficult to predict outcome and advanced disease is often fatal.

We sequenced the whole genomes of 112 primary and metastatic prostate cancer samples. From joint analysis of these cancers with those from previous studies (930 cancers in total), we found evidence for 22 previously unidentified putative driver genes harboring coding mutations, as well as evidence for NEAT1 and FOXA1 acting as drivers through noncoding mutations. Through the temporal dissection of aberrations, we identified driver mutations specifically associated with steps in the progression of prostate cancer, establishing, for example, loss of CHD1 and BRCA2 as early events in cancer development of ETS fusion-negative cancers. Computational chemogenomic (canSAR) analysis of prostate cancer mutations identified 11 targets of approved drugs, 7 targets of investigational drugs, and 62 targets of compounds that may be active and should be considered candidates for future clinical trials. Reference


FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data

To understand stem cell differentiation along multiple lineages, it is necessary to resolve heterogeneous cellular states and the ancestral relationships between them.

We developed a robotic miniaturized CEL-Seq2 implementation to carry out deep single-cell RNA-seq of ∼2,000 mouse hematopoietic progenitors enriched for lymphoid lineages, and used an improved clustering algorithm, RaceID3, to identify cell types. To resolve subtle transcriptome differences indicative of lineage biases, we developed FateID, an iterative supervised learning algorithm for the probabilistic quantification of cell fate bias in progenitor populations. Here we used FateID to delineate domains of fate bias and enable the derivation of high-resolution differentiation trajectories, thereby revealing a common progenitor population of B cells and plasmacytoid dendritic cells, which we validated by in vitro differentiation assays. Reference


Quantitative assessment of protein activity in orphan tissues and single cells using the metaVIPER algorithm

We and others have shown that transition and maintenance of biological states is controlled by master regulator proteins, which can be inferred by interrogating tissue-specific regulatory models (interactomes) with transcriptional signatures, using the VIPER algorithm.

Yet, some tissues may lack molecular profiles necessary for interactome inference (orphan tissues), or, as for single cells isolated from heterogeneous samples, their tissue context may be undetermined. To address this problem, we introduce metaVIPER, an algorithm designed to assess protein activity in tissue-independent fashion by integrative analysis of multiple, non-tissue-matched interactomes. This assumes that transcriptional targets of each protein will be recapitulated by one or more available interactomes. Reference


A geometric approach to characterize the functional identity of single cells

Single-cell transcriptomic data has the potential to radically redefine our view of cell-type identity. Cells that were previously believed to be homogeneous are now clearly distinguishable in terms of their expression phenotype.

Methods for automatically characterizing the functional identity of cells, and their associated properties, can be used to uncover processes involved in lineage differentiation as well as sub-typing cancer cells. They can also be used to suggest personalized therapies based on molecular signatures associated with pathology. We develop a new method, called ACTION, to infer the functional identity of cells from their transcriptional profile, classify them based on their dominant function, and reconstruct regulatory networks that are responsible for mediating their identity. Reference


SQUID: transcriptomic structural variation detection from RNA-seq

Transcripts are frequently modified by structural variations, which lead to fused transcripts of either multiple genes, known as a fusion gene, or a gene and a previously non-transcribed sequence.

Detecting these modifications, called transcriptomic structural variations (TSVs), especially in cancer tumor sequencing, is an important and challenging computational problem. We introduce SQUID, a novel algorithm to predict both fusion-gene and non-fusion-gene TSVs accurately from RNA-seq alignments. SQUID unifies both concordant and discordant read alignments into one model and doubles the precision on simulation data compared to other approaches. Reference


Cell-free prediction of protein expression costs for growing cells

Translating heterologous proteins places significant burden on host cells, consuming expression resources leading to slower cell growth and productivity. Yet predicting the cost of protein production for any given gene is a major challenge, as multiple processes and factors combine to determine translation efficiency.

To enable prediction of the cost of gene expression in bacteria, we describe here a standard cell-free lysate assay that provides a relative measure of resource consumption when a protein coding sequence is expressed. These lysate measurements can then be used with a computational model of translation to predict the in vivo burden placed on growing E. coli cells for a variety of proteins of different functions and lengths. Reference


Genome evolution across 1,011 Saccharomyces cerevisiae isolates

Large-scale population genomic surveys are essential to explore the phenotypic diversity of natural populations.

Here we report the whole-genome sequencing and phenotyping of 1,011 Saccharomyces cerevisiae isolates, which together provide an accurate evolutionary picture of the genomic variants that shape the species-wide phenotypic landscape of this yeast. Genomic analyses support a single ‘out-of-China’ origin for this species, followed by several independent domestication events. Although domesticated isolates exhibit high variation in ploidy, aneuploidy and genome content, genome evolution in wild isolates is mainly driven by the accumulation of single nucleotide polymorphisms. A common feature is the extensive loss of heterozygosity, which represents an essential source of inter-individual variation in this mainly asexual species. Reference


The evolutionary history of vertebrate RNA viruses

Our understanding of the diversity and evolution of vertebrate RNA viruses is largely limited to those found in mammalian and avian hosts and associated with overt disease.

Here, using a large-scale meta-transcriptomic approach, we discover 214 vertebrate-associated viruses in reptiles, amphibians, lungfish, ray-finned fish, cartilaginous fish and jawless fish. The newly discovered viruses appear in every family or genus of RNA virus associated with vertebrate infection, including those containing human pathogens such as influenza virus, the Arenaviridae and Filoviridae families, and have branching orders that broadly reflected the phylogenetic history of their hosts. Reference


Transcriptome-wide association study of schizophrenia

Genome-wide association studies (GWAS) have identified over 100 risk loci for schizophrenia, but the causal mechanisms remain largely unknown.

We performed a transcriptome-wide association study (TWAS) integrating a schizophrenia GWAS of 79,845 individuals from the Psychiatric Genomics Consortium with expression data from brain, blood, and adipose tissues across 3,693 primarily control individuals. We identified 157 TWAS-significant genes, of which 35 did not overlap a known GWAS locus. Of these 157 genes, 42 were associated with specific chromatin features measured in independent samples, thus highlighting potential regulatory targets for follow-up. Suppression of one identified susceptibility gene, mapk3, in zebrafish showed a significant effect on neurodevelopmental phenotypes. Expression and splicing from the brain captured most of the TWAS effect across all genes. Reference


Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease

Our understanding of kidney disease pathogenesis is limited by an incomplete molecular characterization of the cell types responsible for the organ’s multiple homeostatic functions.

To help fill this knowledge gap, we characterized 57,979 cells from healthy mouse kidneys using unbiased single-cell RNA sequencing. Based on gene expression patterns, we infer that inherited kidney diseases that arise from distinct genetic mutations but share the same phenotypic manifestation originate from the same cell differentiated type. We also found that the kidney collecting duct in adult mice generates a spectrum of cell types via a newly identified transitional cell. Computational cell trajectory analysis and in vivo lineage tracing revealed that intercalated cells and principal cells undergo transitions mediated by the Notch signaling pathway. Reference


Mapping human pluripotent stem cell differentiation pathways using high throughput single-cell RNA-sequencing

Human pluripotent stem cells (hPSCs) provide powerful models for studying cellular differentiations and unlimited sources of cells for regenerative medicine. However, a comprehensive single-cell level differentiation roadmap for hPSCs has not been achieved.

We use high throughput single-cell RNA-sequencing (scRNA-seq), based on optimized microfluidic circuits, to profile early differentiation lineages in the human embryoid body system. We present a cellular-state landscape for hPSC early differentiation that covers multiple cellular lineages, including neural, muscle, endothelial, stromal, liver, and epithelial cells. Through pseudotime analysis, we construct the developmental trajectories of these progenitor cells and reveal the gene expression dynamics in the process of cell differentiation. Reference


De novo reconstruction of human adipose transcriptome reveals conserved lncRNAs as regulators of brown adipogenesis

Obesity has emerged as an alarming health crisis due to its association with metabolic risk factors such as diabetes, dyslipidemia, and hypertension.

Recent work has demonstrated the multifaceted roles of lncRNAs in regulating mouse adipose development, but their implication in human adipocytes remains largely unknown. Here we present a catalog of 3149 adipose active lncRNAs, of which 909 are specifically detected in brown adipose tissue (BAT) by performing deep RNA-seq on adult subcutaneous, omental white adipose tissue and fetal BATs. A total of 169 conserved human lncRNAs show positive correlation with their nearby mRNAs, and knockdown assay supports a role of lncRNAs in regulating their nearby mRNAs. Reference


Multi-omics analysis reveals neoantigen-independent immune cell infiltration in copy-number driven cancers

To realize the full potential of immunotherapy, it is critical to understand the drivers of tumor infiltration by immune cells. Previous studies have linked immune infiltration with tumor neoantigen levels, but the broad applicability of this concept remains unknown.

Here, we find that while this observation is true across cancers characterized by recurrent mutations, it does not hold for cancers driven by recurrent copy number alterations, such as breast and pancreatic tumors. To understand immune invasion in these cancers, we developed an integrative multi-omics framework, identifying the DNA damage response protein ATM as a driver of cytokine production leading to increased immune infiltration. This prediction was validated in numerous orthogonal datasets, as well as experimentally in vitro and in vivo by cytokine release and immune cell migration. Reference


Integrated biology approach reveals molecular and pathological interactions among Alzheimers

Cerebral amyloidosis, neuroinflammation, and tauopathy are key features of Alzheimer’s disease (AD), but interactions among these features remain poorly understood.

Our previous multiscale molecular network models of AD revealed TYROBP as a key driver of an immune- and microglia-specific network that was robustly associated with AD pathophysiology. Recent genetic studies of AD further identified pathogenic mutations in both TREM2 and TYROBP.  In this study, we systematically examined molecular and pathological interactions among Aβ, tau, TREM2, and TYROBP by integrating signatures from transgenic Drosophila models of AD and transcriptome-wide gene co-expression networks from two human AD cohorts. Reference