Science in this Week (August, 2018)

Update on: August 17, 2018

Decoding a cancer-relevant splicing decision in the RON proto-oncogene

Mutations causing aberrant splicing are frequently implicated in human diseases including cancer.

Here, we establish a high-throughput screen of randomly mutated minigenes to decode the cis-regulatory landscape that determines alternative splicing of exon 11 in the proto-oncogene MST1R (RON). Mathematical modelling of splicing kinetics enables us to identify more than 1000 mutations affecting RON exon 11 skipping, which corresponds to the pathological isoform RON∆165. Importantly, the effects correlate with RON alternative splicing in cancer patients bearing the same mutations. Reference

Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data

The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International Wheat Genome Sequencing Consortium.

Genome browsers, BLAST, and InterMine tools have been established for in-depth exploration of the genome sequence together with additional linked datasets including physical maps, sequence variations, gene expression, and genetic and phenomic data from other international collaborative projects already stored in the GnpIS information system. Reference

Discovery of cationic nonribosomal peptides as Gram-negative antibiotics through global genome mining

The worldwide prevalence of infections caused by antibiotic-resistant Gram-negative bacteria poses a serious threat to public health due to the limited therapeutic alternatives.

Cationic peptides represent a large family of antibiotics and have attracted interest due to their diverse chemical structures and potential for combating drug-resistant Gram-negative pathogens. Here, we analyze 7395 bacterial genomes to investigate their capacity for biosynthesis of cationic nonribosomal peptides with activity against Gram-negative bacteria. Reference

Population genomics and morphometric assignment of western honey bees

Apis mellifera scutellata and A.m. capensis (the Cape honey bee) are western honey bee subspecies indigenous to the Republic of South Africa (RSA). Both bees are important for biological and economic reasons. First, A.m. scutellata is the invasive “African honey bee” of the Americas and exhibits a number of traits that beekeepers consider undesirable.

They swarm excessively, are prone to absconding (vacating the nest entirely), usurp other honey bee colonies, and exhibit heightened defensiveness. Second, Cape honey bees are socially parasitic bees; the workers can reproduce thelytokously. Both bees are indistinguishable visually. Therefore, we employed Genotyping-by-Sequencing (GBS), wing geometry and standard morphometric approaches to assess the genetic diversity and population structure of these bees to search for diagnostic markers that can be employed to distinguish between the two subspecies. Reference

Copy number signatures and mutational processes in ovarian carcinoma

The genomic complexity of profound copy number aberrations has prevented effective molecular stratification of ovarian cancers.

Here, to decode this complexity, we derived copy number signatures from shallow whole-genome sequencing of 117 high-grade serous ovarian cancer (HGSOC) cases, which were validated on 527 independent cases. We show that HGSOC comprises a continuum of genomes shaped by multiple mutational processes that result in known patterns of genomic aberration. Copy number signature exposures at diagnosis predict both overall survival and the probability of platinum-resistant relapse. Measurement of signature exposures provides a rational framework to choose combination treatments that target multiple mutational processes. Reference

QsRNA-seq: a method for high-throughput profiling and quantifying small RNAs

The ability to profile and quantify small non-coding RNAs (sRNAs), specifically microRNAs (miRNAs), using high-throughput sequencing is challenging because of their small size.

We developed QsRNA-seq, a method for preparation of sRNA libraries for high-throughput sequencing that overcomes this difficulty by enabling a gel-free separation of fragments shorter than 100 nt that differ only by 20 nt in length. The method allows the use of unique molecular identifiers for quantification and is more amenable to automation than gel-based methods. We show that QsRNA-seq gives very accurate, comprehensive, and reproducible results by looking at miRNAs in Caenorhabditis elegans embryos and larvae. Reference

Genetic and transcriptional evolution alters cancer cell line drug response

Human cancer cell lines are the workhorse of cancer research. Although cell lines are known to evolve in culture, the extent of the resultant genetic and transcriptional heterogeneity and its functional consequences remain understudied.

Here we use genomic analyses of 106 human cell lines grown in two laboratories to show extensive clonal diversity. Further comprehensive genomic characterization of 27 strains of the common breast cancer cell line MCF7 uncovered rapid genetic diversification. Similar results were obtained with multiple strains of 13 additional cell lines. Notably, genetic changes were associated with differential activation of gene expression programs and marked differences in cell morphology and proliferation. Reference

ChromTime: modeling spatio-temporal dynamics of chromatin marks

To model spatial changes of chromatin mark peaks over time we develop and apply ChromTime, a computational method that predicts peaks to be either expanding, contracting, or holding steady between time points.

Predicted expanding and contracting peaks can mark regulatory regions associated with transcription factor binding and gene expression changes. Spatial dynamics of peaks provide information about gene expression changes beyond localized signal density changes. ChromTime detects asymmetric expansions and contractions, which for some marks associate with the direction of transcription. ChromTime facilitates the analysis of time course chromatin data in a range of biological systems. Reference

GWAS identifies multiple new loci associated with Ewing sarcoma susceptibility

Ewing sarcoma (EWS) is a pediatric cancer characterized by the EWSR1-FLI1 fusion. We performed a genome-wide association study of 733 EWS cases and 1346 unaffected individuals of European ancestry.

Our study replicates previously reported susceptibility loci at 1p36.22, 10q21.3 and 15q15.1, and identifies new loci at 6p25.1, 20p11.22 and 20p11.23. Effect estimates exhibit odds ratios in excess of 1.7, which is high for cancer GWAS, and striking in light of the rarity of EWS cases in familial cancer syndromes. Expression quantitative trait locus (eQTL) analyses identify candidate genes at 6p25.1 (RREB1) and 20p11.23 (KIZ). The 20p11.22 locus is near NKX2-2, a highly overexpressed gene in EWS. Reference

CTCF maintains regulatory homeostasis of cancer pathways

CTCF binding to DNA helps partition the mammalian genome into discrete structural and regulatory domains. Complete removal of CTCF from mammalian cells causes catastrophic genome dysregulation, likely due to widespread collapse of 3D chromatin looping and alterations to inter- and intra-TAD interactions within the nucleus.

In contrast, Ctcf hemizygous mice with lifelong reduction of CTCF expression are viable, albeit with increased cancer incidence. Here, we exploit chronic Ctcf hemizygosity to reveal its homeostatic roles in maintaining genome function and integrity. Reference

FusionPathway: Prediction of pathways and therapeutic targets associated with gene fusions in cancer

Numerous gene fusions have been uncovered across multiple cancer types. Although the ability to target several of these fusions has led to the development of some successful anti-cancer drugs, most of them are not druggable.

Understanding the molecular pathways of a fusion is important in determining its function in oncogenesis and in developing therapeutic strategies for patients harboring the fusion. However, the molecular pathways have been elucidated for only a few fusions, in part because of the labor-intensive nature of the required functional assays. Therefore, we developed a domain-based network approach to infer the pathways of a fusion. Molecular interactions of a fusion are first predicted by using its protein domain composition, and its associated pathways are then inferred from these molecular interactions. We demonstrated the capabilities of this approach by primarily applying it to the well-studied BCR-ABL1 fusion. Reference

A genome-wide siRNA screen identifies a druggable host pathway essential for the Ebola virus life cycle

The 2014–2016 Ebola virus (EBOV) outbreak in West Africa highlighted the need for improved therapeutic options against this virus.

Approaches targeting host factors/pathways essential for the virus are advantageous because they can potentially target a wide range of viruses, including newly emerging ones and because the development of resistance is less likely than when targeting the virus directly. In order to identify host factors involved in the EBOV life cycle, we performed a genome-wide siRNA screen comprising 64,755 individual siRNAs against 21,566 human genes to assess their activity in EBOV genome replication and transcription. Reference

Exploring the OncoGenomic Landscape of cancer

The widespread incorporation of next-generation sequencing into clinical oncology has yielded an unprecedented amount of molecular data from thousands of patients.

We present OncoGenomic Landscapes, a framework to analyze and display thousands of cancer genomic profiles in a 2D space. Our tool allows users to rapidly assess the heterogeneity of large cohorts, enabling the comparison to other groups of patients, and using driver genes as landmarks to aid in the interpretation of the landscapes.  Reference

Selection-driven cost-efficiency optimization of transcripts modulates gene evolutionary rate in bacteria

Most amino acids are encoded by multiple synonymous codons. However, synonymous codons are not used equally, and this biased codon use varies between different organisms. Through analysis of 1320 bacterial genomes, we show that bacterial genes are subject to multi-objective selection-driven optimization of codon use.

Here, selection acts to simultaneously decrease transcript biosynthetic cost and increase transcript translational efficiency, with highly expressed genes under the greatest selection. This optimization is not simply a consequence of the more translationally efficient codons being less expensive to synthesize.  Reference

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations

The human reference genome is used extensively in modern biological research. However, a single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population.

Using 10× Genomics (10×G) “Linked-Read” technology, we perform whole genome sequencing (WGS) and de novo assembly on 17 individuals across five populations. We identify 1842 breakpoint-resolved non-reference unique insertions (NUIs) that, in aggregate, add up to 2.1 Mb of so far undescribed genomic content. Among these, 64% are considered ancestral to humans since they are found in non-human primate genomes. Reference

Human genetic variants and age are the strongest predictors of humoral immune responses to common pathogens and vaccines

Humoral immune responses to infectious agents or vaccination vary substantially among individuals, and many of the factors responsible for this variability remain to be defined. Current evidence suggests that human genetic variation influences (i) serum immunoglobulin levels, (ii) seroconversion rates, and (iii) intensity of antigen-specific immune responses.

Here, we evaluated the impact of intrinsic (age and sex), environmental, and genetic factors on the variability of humoral response to common pathogens and vaccines.  We characterized the serological response to 15 antigens from common human pathogens or vaccines, in an age- and sex-stratified cohort of 1000 healthy individuals (Milieu Intérieur cohort).  Reference

Extensive intraspecific gene order and gene structural variations between Mo17 and other maize genomes

Maize is an important crop with a high level of genome diversity and heterosis. The genome sequence of a typical female line, B73, was previously released.

Here, we report a de novo genome assembly of a corresponding male representative line, Mo17. More than 96.4% of the 2,183 Mb assembled genome can be accounted for by 362 scaffolds in ten pseudochromosomes with 38,620 annotated protein-coding genes. Comparative analysis revealed large gene-order and gene structural variations: approximately 10% of the annotated genes were mutually nonsyntenic, and more than 20% of the predicted genes had either large-effect mutations or large structural variations, which might cause considerable protein divergence between the two inbred lines. Reference

Chromatin loop anchors are associated with genome instability in cancer

Chromatin loops form a basic unit of interphase nuclear organization, with chromatin loop anchor points providing contacts between regulatory regions and promoters. However, the mutational landscape at these anchor points remains under-studied.

Here, we describe the unusual patterns of somatic mutations and germline variation associated with loop anchor points and explore the underlying features influencing these patterns.  Analyses of whole genome sequencing datasets reveal that anchor points are strongly depleted for single nucleotide variants (SNVs) in tumours. Reference

Integrative omics analyses broaden treatment targets in human cancer

Although large-scale, next-generation sequencing (NGS) studies of cancers hold promise for enabling precision oncology, challenges remain in integrating NGS with clinically validated biomarkers.

To overcome such challenges, we utilized the Database of Evidence for Precision Oncology (DEPO) to link druggability to genomic, transcriptomic, and proteomic biomarkers. Using a pan-cancer cohort of 6570 tumors, we identified tumors with potentially druggable biomarkers consisting of drug-associated mutations, mRNA expression outliers, and protein/phosphoprotein expression outliers identified by DEPO. Reference

Functional characterization of enhancer evolution in the primate lineage

Enhancers play an important role in morphological evolution and speciation by controlling the spatiotemporal expression of genes. Previous efforts to understand the evolution of enhancers in primates have typically studied many enhancers at low resolution, or single enhancers at high resolution.

We identified candidate hominoid-specific liver enhancers from H3K27ac ChIP-seq data. After locating orthologs in 11 primates spanning around 40 million years, we synthesized all orthologs as well as computational reconstructions of 9 ancestral sequences for 348 active tiles of 233 putative enhancers. We concurrently tested all sequences for regulatory activity with STARR-seq in HepG2 cells.  Reference

Epigenome-wide DNA methylation profiling in Progressive Supranuclear Palsy reveals major changes at DLX1

Genetic, epigenetic, and environmental factors contribute to the multifactorial disorder progressive supranuclear palsy (PSP).

Here, we study epigenetic changes by genome-wide analysis of DNA from postmortem tissue of forebrains of patients and controls and detect significant (P < 0.05) methylation differences at 717 CpG sites in PSP vs. controls. Four-hundred fifty-one of these sites are associated with protein-coding genes. While differential methylation only affects a few sites in most genes, DLX1 is hypermethylated at multiple sites. Expression of an antisense transcript of DLX1, DLX1AS, is reduced in PSP brains. The amount of DLX1 protein is increased in gray matter of PSP forebrains. Pathway analysis suggests that DLX1 influences MAPT-encoded Tau protein. Reference

Wnt evolution and function shuffling in liberal and conservative chordate genomes

What impact gene loss has on the evolution of developmental processes, and how function shuffling has affected retained genes driving essential biological processes, remain open questions in the fields of genome evolution and EvoDevo.

We conduct an exhaustive survey of Wnt genes in genomic databases, identifying 156 Wnt genes in 13 non-vertebrate chordates. This represents the most complete Wnt gene catalog of the chordate subphyla and has allowed us to resolve previous ambiguities about the orthology of many Wnt genes, including the identification of WntA for the first time in chordates. Moreover, we create the first complete expression atlas for the Wnt family during amphioxus development, providing a useful resource to investigate the evolution of Wnt expression throughout the radiation of chordates. Reference

Predicting the clinical impact of human mutation with deep neural networks

Millions of human genomes and exomes have been sequenced, but their clinical applications remain limited due to the difficulty of distinguishing disease-causing mutations from benign genetic variation.

Here we demonstrate that common missense variants in other primate species are largely clinically benign in human, enabling pathogenic mutations to be systematically identified by the process of elimination. Using hundreds of thousands of common variants from population sequencing of six non-human primate species, we train a deep neural network that identifies pathogenic mutations in rare disease patients with 88% accuracy and enables the discovery of 14 new candidate genes in intellectual disability at genome-wide significance. Reference

Gene discovery and polygenic prediction from a GWAS of educational attainment in 1.1 million individuals

Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs.

For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. Reference

Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins

Alterations in gut microbiota have been implicated in the pathogenesis of inflammatory bowel disease (IBD), however factors that mediate the host–microbiota interactions remain largely unknown.

Here we collected mucosal-luminal interface samples from a pediatric IBD inception cohort and characterized both the human and microbiota proteins using metaproteomics. We show that microbial proteins related to oxidative stress responses are upregulated in IBD cases compared to controls. In particular, we demonstrate that the expression of human proteins related to oxidative antimicrobial activities is increased in IBD cases and correlates with the alteration of microbial functions. Reference

Functional and genomic analyses reveal therapeutic potential of targeting β-catenin

Head and neck squamous cell carcinoma (HNSCC) is an aggressive malignancy characterized by tumor heterogeneity, locoregional metastases, and resistance to existing treatments. Although a number of genomic and molecular alterations associated with HNSCC have been identified, they have had limited impact on the clinical management of this disease.

We utilized a combination of computational and experimental profiling approaches to examine the effects of blocking the interaction between β-catenin and cAMP-responsive element binding (CREB)-binding protein (CBP) using the small molecule inhibitor ICG-001. We generated and annotated in vitro treatment gene expression signatures of HNSCC cells, derived from human oral squamous cell carcinomas (OSCCs), using microarrays.  Reference

GIVE: portable genome browsers for personal websites

Growing popularity and diversity of genomic data demand portable and versatile genome browsers.

Here, we present an open source programming library called GIVE that facilitates the creation of personalized genome browsers without requiring a system administrator. By inserting HTML tags, one can add to a personal webpage interactive visualization of multiple types of genomics data, including genome annotation, “linear” quantitative data, and genome interaction data. GIVE includes a graphical interface called HUG (HTML Universal Generator) that automatically generates HTML code for displaying user chosen data, which can be copy-pasted into user’s personal website or saved and shared with collaborators. Reference

Genomic inference of the metabolism and evolution of the archaeal phylum Aigarchaeota

Microbes of the phylum Aigarchaeota are widely distributed in geothermal environments, but their physiological and ecological roles are poorly understood.

Here we analyze six Aigarchaeota metagenomic bins from two circumneutral hot springs in Tengchong, China, to reveal that they are either strict or facultative anaerobes, and most are chemolithotrophs that can perform sulfide oxidation. Applying comparative genomics to the Thaumarchaeota and Aigarchaeota, we find that they both originated from thermal habitats, sharing 1154 genes with their common ancestor. Reference

Identification of RNA-binding protein targets with HyperTRIBE

RNA-binding proteins (RBPs) accompany RNA from birth to death, affecting RNA biogenesis and functions. Identifying RBP–RNA interactions is essential to understanding their complex roles in different cellular processes.

However, detecting in vivo RNA targets of RBPs, especially in a small number of discrete cells, has been a technically challenging task. We previously developed a novel technique called TRIBE (targets of RNA-binding proteins identified by editing) to overcome this problem. TRIBE expresses a fusion protein consisting of a queried RBP and the catalytic domain of the RNA-editing enzyme ADAR (adenosine deaminase acting on RNA) (ADARcd), which marks target RNA transcripts by converting adenosine to inosine near the RBP binding sites. These marks can be subsequently identified via high-throughput sequencing. Reference

Integrated genetic and epigenetic analysis of myxofibrosarcoma

Myxofibrosarcoma (MFS) is a common adult soft tissue sarcoma characterized by an infiltrative growth pattern and a high local recurrence rate.


Here we report the genetic and epigenetic landscape of MFS based on the results of whole-exome sequencing (N = 41), RNA sequencing (N = 29), and methylation analysis (N = 41), using 41 MFSs as a discovery set, and subsequent targeted sequencing of 140 genes in the entire cohort of 99 MFSs and 17 MFSs’ data from TCGA. Fourteen driver genes are identified, including potentially actionable therapeutic targets seen in 37% of cases. Reference

Population genomics of hypervirulent Klebsiella pneumoniae clonal-group 23 reveals early emergence and rapid global dissemination

Severe liver abscess infections caused by hypervirulent clonal-group CG23 Klebsiella pneumoniae have been increasingly reported since the mid-1980s. Strains typically possess several virulence factors including an integrative, conjugative element ICEKp encoding the siderophore yersiniabactin and genotoxin colibactin.

Here we investigate CG23’s evolutionary history, showing several deep-branching sublineages associated with distinct ICEKp acquisitions. Over 80% of liver abscess isolates belong to sublineage CG23-I, which emerged in ~1928 following acquisition of ICEKp10 (encoding yersiniabactin and colibactin), and then disseminated globally within the human population. CG23-I’s distinguishing feature is the colibactin synthesis locus, which reportedly promotes gut colonisation and metastatic infection in murine models. Reference

Modelling how responsiveness to interferon improves interferon-free treatment of hepatitis C virus infection

Direct-acting antiviral agents (DAAs) for hepatitis C treatment tend to fare better in individuals who are also likely to respond well to interferon-alpha (IFN), a surprising correlation given that DAAs target specific viral proteins whereas IFN triggers a generic antiviral immune response. Here, we posit a causal relationship between IFN-responsiveness and DAA treatment outcome. IFN-responsiveness restricts viral replication, which would prevent the growth of viral variants resistant to DAAs and improve treatment outcome.

To test this hypothesis, we developed a multiscale mathematical model integrating IFN-responsiveness at the cellular level, viral kinetics and evolution leading to drug resistance at the individual level, and treatment outcome at the population level. Model predictions quantitatively captured data from over 50 clinical trials demonstrating poorer response to DAAs in previous non-responders to IFN than treatment-naïve individuals, presenting strong evidence supporting the hypothesis.  Reference

Large-scale gene losses underlie the genome evolution of parasitic plant Cuscuta australis

Dodders (Cuscuta spp., Convolvulaceae) are root- and leafless parasitic plants. The physiology, ecology, and evolution of these obligate parasites are poorly understood. A high-quality reference genome of Cuscuta australis was assembled.

Our analyses reveal that Cuscuta experienced accelerated molecular evolution, and Cuscuta and the convolvulaceous morning glory (Ipomoea) shared a common whole-genome triplication event before their divergence. C. australis genome harbors 19,671 protein-coding genes, and importantly, 11.7% of the conserved orthologs in autotrophic plants are lost in C. australis. Many of these gene loss events likely result from its parasitic lifestyle and the massive changes of its body plan. Reference

Prioritization and functional assessment of noncoding variants associated with complex diseases

Unraveling functional noncoding variants associated with complex diseases is still a great challenge. We present a novel algorithm, Prioritization

And Functional Assessment (PAFA), that prioritizes and assesses the functionality of genetic variants by introducing population differentiation measures and recalibrating training variants. Comprehensive evaluations demonstrate that PAFA exhibits much higher sensitivity and specificity in prioritizing noncoding risk variants than existing methods. PAFA achieves improved performance in distinguishing both common and rare recurrent variants from non-recurrent variants by integrating multiple annotations and metrics. Reference

Integrative genomic analysis of adult mixed phenotype acute leukemia delineates lineage associated molecular subtypes

Mixed phenotype acute leukemia (MPAL) is a rare subtype of acute leukemia characterized by leukemic blasts presenting myeloid and lymphoid markers.

Here we report data from integrated genomic analysis on 31 MPAL samples and compare molecular profiling with that from acute myeloid leukemia (AML), B cell acute lymphoblastic leukemia (B-ALL), and T cell acute lymphoblastic leukemia (T-ALL). Consistent with the mixed immunophenotype, both AML-type and ALL-type mutations are detected in MPAL. Myeloid-B and myeloid-T MPAL show distinct mutation and methylation signatures that are associated with differences in lineage-commitment gene expressions. Reference

Horizontal transfer of BovB and L1 retrotransposons in eukaryotes

Transposable elements (TEs) are mobile DNA sequences, colloquially known as jumping genes because of their ability to replicate to new genomic locations. TEs can jump between organisms or species when given a vector of transfer, such as a tick or virus, in a process known as horizontal transfer.

Here, we propose that LINE-1 (L1) and Bovine-B (BovB), the two most abundant TE families in mammals, were initially introduced as foreign DNA via ancient horizontal transfer events.  Using analyses of 759 plant, fungal and animal genomes, we identify multiple possible L1 horizontal transfer events in eukaryotic species, primarily involving Tx-like L1s in marine eukaryotes.  Reference

COBRAme: A computational framework for genome-scale models of metabolism and gene expression

Genome-scale models of metabolism and macromolecular expression (ME-models) explicitly compute the optimal proteome composition of a growing cell. ME-models expand upon the well-established genome-scale models of metabolism (M-models), and they enable a new fundamental understanding of cellular growth.

ME-models have increased predictive capabilities and accuracy due to their inclusion of the biosynthetic costs for the machinery of life, but they come with a significant increase in model size and complexity. This challenge results in models which are both difficult to compute and challenging to understand conceptually. As a result, ME-models exist for only two organisms (Escherichia coli and Thermotoga maritima) and are still used by relatively few researchers. To address these challenges, we have developed a new software framework called COBRAme for building and simulating ME-models. Reference

A large-scale WGS  analysis reveals highly specific genome editing by both Cas9 and Cpf1 (Cas12a) nucleases in rice

Targeting specificity has been a barrier to applying genome editing systems in functional genomics, precise medicine and plant breeding.

We conduct a WGS analysis of 34 plants edited by Cas9 and 15 plants edited by Cpf1 in T0 and T1 generations along with 20 diverse control plants in rice. The sequencing depths range from 45× to 105× with read mapping rates above 96%. Our results clearly show that most mutations in edited plants are created by the tissue culture process, which causes approximately 102 to 148 single nucleotide variations (SNVs) and approximately 32 to 83 insertions/deletions (indels) per plant. Reference

Deep coverage whole genome sequences and plasma lipoprotein(a) in individuals of European and African ancestries

Lipoprotein(a), Lp(a), is a modified low-density lipoprotein particle that contains apolipoprotein(a), encoded by LPA, and is a highly heritable, causal risk factor for cardiovascular diseases that varies in concentrations across ancestries.

Here, we use deep-coverage whole genome sequencing in 8392 individuals of European and African ancestry to discover and interpret both single-nucleotide variants and copy number (CN) variation associated with Lp(a). We observe that genetic determinants between Europeans and Africans have several unique determinants. The common variant rs12740374 associated with Lp(a) cholesterol is an eQTL for SORT1 and independent of LDL cholesterol. Reference

Adaptation and conservation insights from the koala genome

The koala, the only extant species of the marsupial family Phascolarctidae, is classified as ‘vulnerable’ due to habitat loss and widespread disease. We sequenced the koala genome, producing a complete and contiguous marsupial reference genome, including centromeres.

We reveal that the koala’s ability to detoxify eucalypt foliage may be due to expansions within a cytochrome P450 gene family, and its ability to smell, taste and moderate ingestion of plant secondary metabolites may be due to expansions in the vomeronasal and taste receptors. We characterized novel lactation proteins that protect young in the pouch and annotated immune genes important for response to chlamydial disease. Reference

Modelling genotypes in their physical microenvironment to predict single- and multi-cellular behaviour

A cell’s phenotype is the set of observable characteristics resulting from the interaction of the genotype with the surrounding environment, determining cell behaviour.

Deciphering genotype-phenotype relationships has been crucial to understand normal and disease biology. Analysis of molecular pathways has provided an invaluable tool to such understanding; however, it has typically lacked a component describing the physical context, which is a key determinant of phenotype. In this study, we present a novel modelling framework that enables to study the link between genotype, signalling networks and cell behaviour in a 3D physical environment. To achieve this we bring together Agent Based Modelling, a powerful computational modelling technique, and gene networks. Reference

INSaFLU: an automated open web-based bioinformatics suite “from-reads” for influenza

A new era of flu surveillance has already started based on the genetic characterization and exploration of influenza virus evolution at whole-genome scale.

We developed and implemented INSaFLU (“INSide the FLU”), which is the first influenza-oriented bioinformatics free web-based suite that deals with primary NGS data (reads) towards the automatic generation of the output data that are actually the core first-line “genetic requests” for effective and timely influenza laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants’ annotation, alignments and phylogenetic trees).  Reference

Transcriptional synergy as an emergent property defining cell subpopulation identity enables population shift

Single-cell RNA sequencing allows defining molecularly distinct cell subpopulations. However, the identification of specific sets of transcription factors (TFs) that define the identity of these subpopulations remains a challenge.

Here we propose that subpopulation identity emerges from the synergistic activity of multiple TFs. Based on this concept, we develop a computational platform (TransSyn) for identifying synergistic transcriptional cores that determine cell subpopulation identities. TransSyn leverages single-cell RNA-seq data, and performs a dynamic search for an optimal synergistic transcriptional core using an information theoretic measure of synergy. Reference

HmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies

Culture-independent analysis of microbial communities frequently relies on amplification and sequencing of the prokaryotic 16S ribosomal RNA gene. Typical analysis pipelines group sequences into operational taxonomic units (OTUs) to infer taxonomic and phylogenetic relationships.

Here, we present HmmUFOtu, a novel tool for processing microbiome amplicon sequencing data, which performs rapid per-read phylogenetic placement, followed by phylogenetically informed clustering into OTUs and taxonomy assignment. Compared to standard pipelines, HmmUFOtu more accurately and reliably recapitulates microbial community diversity and composition in simulated and real datasets without relying on heuristics or sacrificing speed or accuracy. Reference

Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation

We apply integrative approaches to expression quantitative loci (eQTLs) from 44 tissues from the Genotype-Tissue Expression project and genome-wide association study data. About 60% of known trait-associated loci are in linkage disequilibrium with a cis-eQTL, over half of which were not found in previous large-scale whole blood studies.

Applying polygenic analyses to metabolic, cardiovascular, anthropometric, autoimmune, and neurodegenerative traits, we find that eQTLs are significantly enriched for trait associations in relevant pathogenic tissues and explain a substantial proportion of the heritability (40–80%). For most traits, tissue-shared eQTLs underlie a greater proportion of trait associations, although tissue-specific eQTLs have a greater contribution to some traits, such as blood pressure. Reference

Molecular phenomics and metagenomics of hepatic steatosis in non-diabetic obese women

Hepatic steatosis is a multifactorial condition that is often observed in obese patients and is a prelude to non-alcoholic fatty liver disease.

Here, we combine shotgun sequencing of fecal metagenomes with molecular phenomics (hepatic transcriptome and plasma and urine metabolomes) in two well-characterized cohorts of morbidly obese women recruited to the FLORINASH study. We reveal molecular networks linking the gut microbiome and the host phenome to hepatic steatosis. Patients with steatosis have low microbial gene richness and increased genetic potential for the processing of dietary lipids and endotoxin biosynthesis (notably from Proteobacteria), hepatic inflammation and dysregulation of aromatic and branched-chain amino acid metabolism. Reference

Therapy-induced stress response is associated with downregulation of pre-mRNA splicing in cancer cells

Abnormal pre-mRNA splicing regulation is common in cancer, but the effects of chemotherapy on this process remain unclear.

To evaluate the effect of chemotherapy on slicing regulation, we performed meta-analyses of previously published transcriptomic, proteomic, phosphoproteomic, and secretome datasets. Our findings were verified by LC-MS/MS, western blotting, immunofluorescence, and FACS analyses of multiple cancer cell lines treated with cisplatin and pladienolide B.  Our results revealed that different types of chemotherapy lead to similar changes in alternative splicing by inducing intron retention in multiple genes. Reference

Meta-analysis of GWAS for neuroticism in 449,484 individuals identifies novel genetic loci and pathways

Neuroticism is an important risk factor for psychiatric traits, including depression, anxiety, and schizophrenia. At the time of analysis, previous genome-wide association studies (GWAS) reported 16 genomic loci associated to neuroticism.

Here we conducted a large GWAS meta-analysis (n = 449,484) of neuroticism and identified 136 independent genome-wide significant loci (124 new at the time of analysis), which implicate 599 genes. Functional follow-up analyses showed enrichment in several brain regions and involvement of specific cell types, including dopaminergic neuroblasts (P = 3.49 × 10−8), medium spiny neurons (P = 4.23 × 10−8), and serotonergic neurons (P = 1.37 × 10−7). Gene set analyses implicated three specific pathways: neurogenesis (P = 4.43 × 10−9), behavioral response to cocaine processes (P = 1.84 × 10−7), and axon part (P = 5.26 × 10−8).  Reference

DeepCRISPR: optimized CRISPR guide RNA design by deep learning

A major challenge for effective application of CRISPR systems is to accurately predict the single guide RNA (sgRNA) on-target knockout efficacy and off-target profile, which would facilitate the optimized design of sgRNAs with high sensitivity and specificity.

Here we present DeepCRISPR, a comprehensive computational platform to unify sgRNA on-target and off-target site prediction into one framework with deep learning, surpassing available state-of-the-art in silico tools. Reference

dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments

Recent single-cell RNA-seq protocols based on droplet microfluidics use massively multiplexed barcoding to enable simultaneous measurements of transcriptomes for thousands of individual cells. The increasing complexity of such data creates challenges for subsequent computational processing and troubleshooting of these experiments, with few software options currently available.

Here, we describe a flexible pipeline for processing droplet-based transcriptome data that implements barcode corrections, classification of cell quality, and diagnostic information about the droplet libraries. We introduce advanced methods for correcting composition bias and sequencing errors affecting cellular and molecular barcodes to provide more accurate estimates of molecular counts in individual cells. Reference