Science in this Week (September, 2018)

Update on: September 21, 2018

Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning

Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and subtype of lung tumors.

Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep convolutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Reference

Robust single-cell DNA methylome profiling with snmC-seq2

Single-cell DNA methylome profiling has enabled the study of epigenomic heterogeneity in complex tissues and during cellular reprogramming.

However, broader applications of the method have been impeded by the modest quality of sequencing libraries. Here we report snmC-seq2, which provides improved read mapping, reduced artifactual reads, enhanced throughput, as well as increased library complexity and coverage uniformity compared to snmC-seq. snmC-seq2 is an efficient strategy suited for large-scale single-cell epigenomic studies. Reference

Genomic history of the Sardinian population

The population of the Mediterranean island of Sardinia has made important contributions to genome-wide association studies of complex disease traits and, based on ancient DNA studies of mainland Europe, Sardinia is hypothesized to be a unique refuge for early Neolithic ancestry.

To provide new insights on the genetic history of this flagship population, we analyzed 3,514 whole-genome sequenced individuals from Sardinia. Sardinian samples show elevated levels of shared ancestry with Basque individuals, especially samples from the more historically isolated regions of Sardinia. Our analysis also uniquely illuminates how levels of genetic similarity with mainland ancient DNA samples varies subtly across the island. Together, our results indicate that within-island substructure and sex-biased processes have substantially impacted the genetic history of Sardinia. Reference

Origin of exon skipping-rich transcriptomes in animals driven by evolution of gene architecture

Alternative splicing, particularly through intron retention and exon skipping, is a major layer of pre-translational regulation in eukaryotes.

While intron retention is believed to be the most prevalent mode across non-animal eukaryotes, animals have unusually high rates of exon skipping.  We used RNA-seq data to quantify exon skipping and intron retention frequencies across 65 eukaryotic species, with particular focus on early branching animals and unicellular holozoans. We found that only bilaterians have significantly increased their exon skipping frequencies compared to all other eukaryotic groups.  Reference

Exome-wide analysis of bi-allelic alterations identifies a Lynch phenotype in The Cancer Genome Atlas

Cancer susceptibility germline variants generally require somatic alteration of the remaining allele to drive oncogenesis and, in some cases, tumor mutational profiles.

Whether combined germline and somatic bi-allelic alterations are universally required for germline variation to influence tumor mutational profile is unclear. Here, we performed an exome-wide analysis of the frequency and functional effect of bi-allelic alterations in The Cancer Genome Atlas (TCGA). We integrated germline variant, somatic mutation, somatic methylation, and somatic copy number loss data from 7790 individuals from TCGA to identify germline and somatic bi-allelic alterations in all coding genes.  Reference

The gut microbiota promotes hepatic fatty acid desaturation and elongation in mice

Interactions between the gut microbial ecosystem and host lipid homeostasis are highly relevant to host physiology and metabolic diseases.

We present a comprehensive multi-omics view of the effect of intestinal microbial colonization on hepatic lipid metabolism, integrating transcriptomic, proteomic, phosphoproteomic, and lipidomic analyses of liver and plasma samples from germfree and specific pathogen-free mice. Microbes induce monounsaturated fatty acid generation by stearoyl-CoA desaturase 1 and polyunsaturated fatty acid elongation by fatty acid elongase 5, leading to significant alterations in glycerophospholipid acyl-chain profiles. Reference

Accurate classification of BRCA1 variants with saturation genome editing

Variants of uncertain significance fundamentally limit the clinical utility of genetic information. The challenge they pose is epitomized by BRCA1, a tumour suppressor gene in which germline loss-of-function variants predispose women to breast and ovarian cancer.

Although BRCA1 has been sequenced in millions of women, the risk associated with most newly observed variants cannot be definitively assigned. Here we use saturation genome editing to assay 96.5% of all possible single-nucleotide variants (SNVs) in 13 exons that encode functionally critical domains of BRCA1. Functional effects for nearly 4,000 SNVs are bimodally distributed and almost perfectly concordant with established assessments of pathogenicity. Reference

DNA methylation footprints during soybean domestication and improvement

In addition to genetic variation, epigenetic variation plays an important role in determining various biological processes.

To understand the impact of epigenetics on crop domestication, we investigate the variation of DNA methylation during soybean domestication and improvement by whole-genome bisulfite sequencing of 45 soybean accessions, including wild soybeans, landraces, and cultivars. Through methylomic analysis, we identify 5412 differentially methylated regions (DMRs). These DMRs exhibit characters distinct from those of genetically selected regions. In particular, they have significantly higher genetic diversity.  Reference

Integrative detection and analysis of structural variation in cancer genomes

Structural variants (SVs) can contribute to oncogenesis through a variety of mechanisms. Despite their importance, the identification of SVs in cancer genomes remains challenging.

Here, we present a framework that integrates optical mapping, high-throughput chromosome conformation capture (Hi-C), and whole-genome sequencing to systematically detect SVs in a variety of normal or cancer samples and cell lines. We identify the unique strengths of each method and demonstrate that only integrative approaches can comprehensively identify SVs in the genome. By combining Hi-C and optical mapping, we resolve complex SVs and phase multiple SV events to a single haplotype. Reference

Exome-wide analysis identifies three low-frequency missense variants associated with pancreatic cancer risk in Chinese populations

Germline coding variants have not been systematically investigated for pancreatic ductal adenocarcinoma (PDAC).

Here we report an exome-wide investigation using the Illumina Human Exome Beadchip with 943 PDAC cases and 3908 controls in the Chinese population, followed by two independent replicate samples including 2142 cases and 4697 controls. We identify three low-frequency missense variants associated with the PDAC risk: rs34309238 in PKN1 (OR = 1.77, 95% CI: 1.48–2.12, P = 5.35 × 10−10), rs2242241 in DOK2 (OR = 1.85, 95% CI: 1.50–2.27, P = 4.34 × 10−9), and rs183117027 in APOB (OR = 2.34, 95% CI: 1.72–3.16, P = 4.21 × 10−8). Functional analyses show that the PKN1 rs34309238 variant significantly increases the level of phosphorylated PKN1 and thus enhances PDAC cells’ proliferation by phosphorylating and activating the FAK/PI3K/AKT pathway. Reference

Increased DNA methylation variability in rheumatoid arthritis-discordant monozygotic twins

Rheumatoid arthritis is a common autoimmune disorder influenced by both genetic and environmental factors.

Epigenome-wide association studies can identify environmentally mediated epigenetic changes such as altered DNA methylation, which may also be influenced by genetic factors. To investigate possible contributions of DNA methylation to the aetiology of rheumatoid arthritis with minimum confounding genetic heterogeneity, we investigated genome-wide DNA methylation in disease-discordant monozygotic twin pairs. Reference

Route of immunization defines multiple mechanisms of vaccine-mediated protection against SIV

Antibodies are the primary correlate of protection for most licensed vaccines; however, their mechanisms of protection may vary, ranging from physical blockade to clearance via the recruitment of innate immunity.

Here, we uncover striking functional diversity in vaccine-induced antibodies that is driven by immunization site and is associated with reduced risk of SIV infection in nonhuman primates. While equivalent levels of protection were observed following intramuscular (IM) and aerosol (AE) immunization with an otherwise identical DNA prime–Ad5 boost regimen, reduced risk of infection was associated with IgG-driven antibody-dependent monocyte-mediated phagocytosis in the IM vaccinees, but with vaccine-elicited IgA-driven neutrophil-mediated phagocytosis in AE-immunized animals. Thus, although route-independent correlates indicate a critical role for phagocytic Fc-effector activity in protection from SIV, the site of immunization may drive this Fc activity via distinct innate effector cells and antibody isotypes. Reference

Decreasing miRNA sequencing bias using a single adapter and circularization approach

The ability to accurately quantify all the microRNAs (miRNAs) in a sample is important for understanding miRNA biology and for development of new biomarkers and therapeutic targets.

We develop a new method for preparing miRNA sequencing libraries, RealSeq®-AC, that involves ligating the miRNAs with a single adapter and circularizing the ligation products. When compared to other methods, RealSeq®-AC provides greatly reduced miRNA sequencing bias and allows the identification of the largest variety of miRNAs in biological samples. This reduced bias also allows robust quantification of miRNAs present in samples across a wide range of RNA input levels. Reference

Co-activation of super-enhancer-driven CCAT1 by TP63 and SOX2 promotes squamous cancer progression

Squamous cell carcinomas (SCCs) are aggressive malignancies. Previous report demonstrated that master transcription factors (TFs) TP63 and SOX2 exhibited overlapping genomic occupancy in SCCs.

However, functional consequence of their frequent co-localization at super-enhancers remains incompletely understood. Here, epigenomic profilings of different types of SCCs reveal that TP63 and SOX2 cooperatively and lineage-specifically regulate long non-coding RNA (lncRNA) CCAT1 expression, through activation of its super-enhancers and promoter. Reference

An orthogonal proteomic survey uncovers novel Zika virus host factors

Zika virus (ZIKV) has recently emerged as a global health concern owing to its widespread diffusion and its association with severe neurological symptoms and microcephaly in newborns.

However, the molecular mechanisms that are responsible for the pathogenicity of ZIKV remain largely unknown. Here we use human neural progenitor cells and the neuronal cell line SK-N-BE2 in an integrated proteomics approach to characterize the cellular responses to viral infection at the proteome and phosphoproteome level, and use affinity proteomics to identify cellular targets of ZIKV proteins. Using this approach, we identify 386 ZIKV-interacting proteins, ZIKV-specific and pan-flaviviral activities as well as host factors with known functions in neuronal development, retinal defects and infertility. Reference

Comparative transcriptomic analysis of hematopoietic system between human and mouse by Microwell-seq

The classical model of hematopoiesis is a branched tree, rooted from long-term hematopoietic stem cell (LT-HSC) and followed by multipotent, oligopotent, and unipotent progenitor stages.

However, very limited studies have used systemic methods to investigate the heterogeneity of this population. The cross-species comparison of hematopoietic hierarchy is also lacking. Here, through Microwell-seq, a high-throughput and low-cost scRNA-seq platform4 and a canonical correlation analysis computational strategy5, we conducted comparative transcriptomic analysis of hematopoietic hierarchy in human and mouse. Reference

Detecting repeated cancer evolution from multi-region tumor sequencing data

Recurrent successions of genomic changes, both within and between patients, reflect repeated evolutionary processes that are valuable for the anticipation of cancer progression.

Multi-region sequencing allows the temporal order of some genomic changes in a tumor to be inferred, but the robust identification of repeated evolution across patients remains a challenge. We developed a machine-learning method based on transfer learning that allowed us to overcome the stochastic effects of cancer evolution and noise in data and identified hidden evolutionary patterns in cancer cohorts. When applied to multi-region sequencing datasets from lung, breast, renal, and colorectal cancer (768 samples from 178 patients), our method detected repeated evolutionary trajectories in subgroups of patients, which were reproduced in single-sample cohorts (n = 2,935). Reference

Comprehensive antibiotic-linked mutation assessment by resistance mutation sequencing (RM-seq)

Mutation acquisition is a major mechanism of bacterial antibiotic resistance that remains insufficiently characterised.

Here we present RM-seq, a new amplicon-based deep sequencing workflow based on a molecular barcoding technique adapted from Low Error Amplicon sequencing (LEA-seq). RM-seq allows detection and functional assessment of mutational resistance at high throughput from mixed bacterial populations. The sensitive detection of very low-frequency resistant sub-populations permits characterisation of antibiotic-linked mutational repertoires in vitro and detection of rare resistant populations during infections.  RM-seq will facilitate comprehensive detection, characterisation and surveillance of resistant bacterial populations.  Reference

A study paradigm integrating prospective epidemiologic cohorts and electronic health records to identify disease biomarkers

Defining the full spectrum of human disease associated with a biomarker is necessary to advance the biomarker into clinical practice.

We hypothesize that associating biomarker measurements with electronic health record (EHR) populations based on shared genetic architectures would establish the clinical epidemiology of the biomarker. We use Bayesian sparse linear mixed modeling to calculate SNP weightings for 53 biomarkers from the Atherosclerosis Risk in Communities study. We use the SNP weightings to computed predicted biomarker values in an EHR population and test associations with 1139 diagnoses. Here we report 116 associations meeting a Bonferroni level of significance. Reference

MetaCyto: A Tool for Automated Meta-analysis of Mass and Flow Cytometry Data

While meta-analysis has demonstrated increased statistical power and more robust estimations in studies, the application of this commonly accepted methodology to cytometry data has been challenging. Different cytometry studies often involve diverse sets of markers.

Moreover, the detected values of the same marker are inconsistent between studies due to different experimental designs and cytometer configurations. As a result, the cell subsets identified by existing auto-gating methods cannot be directly compared across studies. We developed MetaCyto for automated meta-analysis of both flow and mass cytometry (CyTOF) data. By combining clustering methods with a silhouette scanning method, MetaCyto is able to identify commonly labeled cell subsets across studies, thus enabling meta-analysis. Applying MetaCyto across a set of ten heterogeneous cytometry studies totaling 2,926 samples enabled us to identify multiple cell populations exhibiting differences in abundance between demographic groups. Reference

XCMS-MRM and METLIN-MRM: a cloud library and public resource for targeted analysis of small molecules

We report XCMS-MRM and METLIN-MRM (http://xcmsonline-mrm.scripps.edu/ and http://metlin.scripps.edu/), a cloud-based data-analysis platform and a public multiple-reaction monitoring (MRM) transition repository for small-molecule quantitative tandem mass spectrometry.

This platform provides MRM transitions for more than 15,500 molecules and facilitates data sharing across different instruments and laboratories. Reference

A multi-cohort study of the immune factors associated with M. tuberculosis infection outcomes

Most infections with Mycobacterium tuberculosis (Mtb) manifest as a clinically asymptomatic, contained state, known as latent tuberculosis infection, that affects approximately one-quarter of the global population. Although fewer than one in ten individuals eventually progress to active disease, tuberculosis is a leading cause of death from infectious disease worldwide.

Despite intense efforts, immune factors that influence the infection outcomes remain poorly defined. Here we used integrated analyses of multiple cohorts to identify stage-specific host responses to Mtb infection. First, using high-dimensional mass cytometry analyses and functional assays of a cohort of South African adolescents, we show that latent tuberculosis is associated with enhanced cytotoxic responses, which are mostly mediated by CD16 (also known as FcγRIIIa) and natural killer cells, and continuous inflammation coupled with immune deviations in both T and B cell compartments. Reference

Interaction between the microbiome and TP53 in human lung cancer

Lung cancer is the leading cancer diagnosis worldwide and the number one cause of cancer deaths. Exposure to cigarette smoke, the primary risk factor in lung cancer, reduces epithelial barrier integrity and increases susceptibility to infections.

Herein, we hypothesize that somatic mutations together with cigarette smoke generate a dysbiotic microbiota that is associated with lung carcinogenesis. Using lung tissue from 33 controls and 143 cancer cases, we conduct 16S ribosomal RNA (rRNA) bacterial gene sequencing, with RNA-sequencing data from lung cancer cases in The Cancer Genome Atlas serving as the validation cohort. Reference

HiGlass: web-based visual exploration and analysis of genome interaction maps

We present HiGlass, an open source visualization tool built on web technologies that provides a rich interface for rapid, multiplex, and multiscale navigation of 2D genomic maps alongside 1D genomic tracks, allowing users to combine various data types, synchronize multiple visualization modalities, and share fully customizable views with others.

We demonstrate its utility in exploring different experimental conditions, comparing the results of analyses, and creating interactive snapshots to share with collaborators and the broader public. Reference

Insights from the annotated wheat genome

Wheat is one of the major sources of food for much of the world. However, because bread wheat’s genome is a large hybrid mix of three separate subgenomes, it has been difficult to produce a high-quality reference sequence.

Using recent advances in sequencing, the International Wheat Genome Sequencing Consortium presents an annotated reference genome with a detailed analysis of gene content among subgenomes and the structural organization for all the chromosomes.  An annotated reference sequence representing the hexaploid bread wheat genome in the form of 21 chromosome-like sequence assemblies has now been delivered, giving access to 107,891 high-confidence genes, including their genomic context of regulatory sequences. This assembly enabled the discovery of tissue- and developmental stage–related gene coexpression networks using a transcriptome atlas representing all stages of wheat development.  Reference

Robust prediction of response to immune checkpoint blockade therapy in metastatic melanoma

Immune checkpoint blockade (ICB) therapy provides remarkable clinical gains and has been very successful in treatment of melanoma.

However, only a subset of patients with advanced tumors currently benefit from ICB therapies, which at times incur considerable side effects and costs. Constructing predictors of patient response has remained a serious challenge because of the complexity of the immune response and the shortage of large cohorts of ICB-treated patients that include both ‘omics’ and response data. Here we build immuno-predictive score (IMPRES), a predictor of ICB response in melanoma which encompasses 15 pairwise transcriptomics relations between immune checkpoint genes. Reference

Selective gene dependencies in MYCN-amplified neuroblastoma

Childhood high-risk neuroblastomas with MYCN gene amplification are difficult to treat effectively. This has focused attention on tumor-specific gene dependencies that underlie tumorigenesis and thus provide valuable targets for the development of novel therapeutics.

Using unbiased genome-scale CRISPR–Cas9 approaches to detect genes involved in tumor cell growth and survival, we identified 147 candidate gene dependencies selective for MYCN-amplified neuroblastoma cell lines, compared to over 300 other human cancer cell lines. We then used genome-wide chromatin-immunoprecipitation coupled to high-throughput sequencing analysis to demonstrate that a small number of essential transcription factors—MYCN, HAND2, ISL1, PHOX2B, GATA3, and TBX2—are members of the transcriptional core regulatory circuitry (CRC) that maintains cell state in MYCN-amplified neuroblastoma. Reference

A simple genetic basis of adaptation to a novel thermal environment results in complex metabolic rewiring in Drosophila

Population genetic theory predicts that rapid adaptation is largely driven by complex traits encoded by many loci of small effect. Because large-effect loci are quickly fixed in natural populations, they should not contribute much to rapid adaptation.

To investigate the genetic architecture of thermal adaptation — a highly complex trait — we performed experimental evolution on a natural Drosophila simulans population. Transcriptome and respiration measurements reveal extensive metabolic rewiring after only approximately 60 generations in a hot environment. Reference

Decoding a cancer-relevant splicing decision in the RON proto-oncogene

Mutations causing aberrant splicing are frequently implicated in human diseases including cancer.

Here, we establish a high-throughput screen of randomly mutated minigenes to decode the cis-regulatory landscape that determines alternative splicing of exon 11 in the proto-oncogene MST1R (RON). Mathematical modelling of splicing kinetics enables us to identify more than 1000 mutations affecting RON exon 11 skipping, which corresponds to the pathological isoform RON∆165. Importantly, the effects correlate with RON alternative splicing in cancer patients bearing the same mutations. Reference

Linking the International Wheat Genome Sequencing Consortium bread wheat reference genome sequence to wheat genetic and phenomic data

The Wheat@URGI portal has been developed to provide the international community of researchers and breeders with access to the bread wheat reference genome sequence produced by the International Wheat Genome Sequencing Consortium.

Genome browsers, BLAST, and InterMine tools have been established for in-depth exploration of the genome sequence together with additional linked datasets including physical maps, sequence variations, gene expression, and genetic and phenomic data from other international collaborative projects already stored in the GnpIS information system. Reference

Discovery of cationic nonribosomal peptides as Gram-negative antibiotics through global genome mining

The worldwide prevalence of infections caused by antibiotic-resistant Gram-negative bacteria poses a serious threat to public health due to the limited therapeutic alternatives.

Cationic peptides represent a large family of antibiotics and have attracted interest due to their diverse chemical structures and potential for combating drug-resistant Gram-negative pathogens. Here, we analyze 7395 bacterial genomes to investigate their capacity for biosynthesis of cationic nonribosomal peptides with activity against Gram-negative bacteria. Reference

Population genomics and morphometric assignment of western honey bees

Apis mellifera scutellata and A.m. capensis (the Cape honey bee) are western honey bee subspecies indigenous to the Republic of South Africa (RSA). Both bees are important for biological and economic reasons. First, A.m. scutellata is the invasive “African honey bee” of the Americas and exhibits a number of traits that beekeepers consider undesirable.

They swarm excessively, are prone to absconding (vacating the nest entirely), usurp other honey bee colonies, and exhibit heightened defensiveness. Second, Cape honey bees are socially parasitic bees; the workers can reproduce thelytokously. Both bees are indistinguishable visually. Therefore, we employed Genotyping-by-Sequencing (GBS), wing geometry and standard morphometric approaches to assess the genetic diversity and population structure of these bees to search for diagnostic markers that can be employed to distinguish between the two subspecies. Reference