Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics
Genome-wide association studies provide a powerful means of identifying loci and genes contributing to disease, but in many cases, the related cell types/states through which genes confer disease risk remain unknown.
Deciphering such relationships is important for identifying pathogenic processes and developing therapeutics. In the present study, we introduce sc-linker, a framework for integrating single-cell RNA-sequencing, epigenomic SNP-to-gene maps and genome-wide association study summary statistics to infer the underlying cell types and processes by which genetic variants influence disease.
The inferred disease enrichments recapitulated known biology and highlighted notable cell–disease relationships, including γ-aminobutyric acid-ergic neurons in major depressive disorder, a disease-dependent M-cell program in ulcerative colitis and a disease-specific complement cascade process in multiple sclerosis. In autoimmune disease, both healthy and disease-dependent immune cell-type programs were associated, whereas only disease-dependent epithelial cell programs were prominent, suggesting a role in disease response rather than initiation. Our framework provides a powerful approach for identifying the cell types and cellular processes by which genetic variants influence disease. Reference
H3K18 lactylation marks tissue-specific active enhancers
Histone lactylation has been recently described as a novel histone post-translational modification linking cellular metabolism to epigenetic regulation.
Given the expected relevance of this modification and current limited knowledge of its function, we generate genome-wide datasets of H3K18la distribution in various in vitro and in vivo samples, including mouse embryonic stem cells, macrophages, adipocytes, and mouse and human skeletal muscle. We compare them to profiles of well-established histone modifications and gene expression patterns.
Supervised and unsupervised bioinformatics analysis shows that global H3K18la distribution resembles H3K27ac, although we also find notable differences. H3K18la marks active CpG island-containing promoters of highly expressed genes across most tissues assessed, including many housekeeping genes, and positively correlates with H3K27ac and H3K4me3 as well as with gene expression. In addition, H3K18la is enriched at active enhancers that lie in proximity to genes that are functionally important for the respective tissue. Reference
A transcriptional metastatic signature predicts survival in clear cell renal cell carcinoma
Clear cell renal cell carcinoma (ccRCC) is the most common type of kidney cancer in adults. When ccRCC is localized to the kidney, surgical resection or ablation of the tumor is often curative. However, in the metastatic setting, ccRCC remains a highly lethal disease.
Here we use fresh patient samples that include treatment-naive primary tumor tissue, matched adjacent normal kidney tissue, as well as tumor samples collected from patients with bone metastases. Single-cell transcriptomic analysis of tumor cells from the primary tumors reveals a distinct transcriptional signature that is predictive of metastatic potential and patient survival. Analysis of supporting stromal cells within the tumor environment demonstrates vascular remodeling within the endothelial cells.
An in silico cell-to-cell interaction analysis highlights the CXCL9/CXCL10-CXCR3 axis and the CD70-CD27 axis as potential therapeutic targets. Our findings provide biological insights into the interplay between tumor cells and the ccRCC microenvironment. Reference
A pan-cancer mycobiome analysis reveals fungal involvement in gastrointestinal and lung tumors
Fungal microorganisms (mycobiota) comprise a small but immunoreactive component of the human microbiome, yet little is known about their role in human cancers. Pan-cancer analysis of multiple body sites revealed tumor-associated mycobiomes at up to 1 fungal cell per 104 tumor cells.
In lung cancer, Blastomyces was associated with tumor tissues. In stomach cancers, high rates of Candida were linked to the expression of pro-inflammatory immune pathways, while in colon cancers Candida was predictive of metastatic disease and attenuated cellular adhesions. Across multiple GI sites, several Candida species were enriched in tumor samples and tumor-associated Candida DNA was predictive of decreased survival.
The presence of Candida in human GI tumors was confirmed by external ITS sequencing of tumor samples and by culture-dependent analysis in an independent cohort. These data implicate the mycobiota in the pathogenesis of GI cancers and suggest that tumor-associated fungal DNA may serve as diagnostic or prognostic biomarkers. Reference
Transcriptional signatures of the BCL2 family for individualized acute myeloid leukaemia treatment
Although anti-apoptotic proteins of the B-cell lymphoma-2 (BCL2) family have been utilized as therapeutic targets in acute myeloid leukaemia (AML), their complicated regulatory networks make individualized therapy difficult.
This study aimed to discover the transcriptional signatures of BCL2 family genes that reflect regulatory dynamics, which can guide individualized therapeutic strategies. From three AML RNA-seq cohorts (BeatAML, LeuceGene, and TCGA; n = 451, 437, and 179, respectively), we constructed the BCL2 family signatures (BFSigs) by applying an innovative gene-set selection method reflecting biological knowledge followed by non-negative matrix factorization (NMF).
To demonstrate the significance of the BFSigs, we conducted modelling to predict response to BCL2 family inhibitors, clustering, and functional enrichment analysis. Cross-platform validity of BFSigs was also confirmed using NanoString technology in a separate cohort of 47 patients. We established BFSigs labeled as the BCL2, MCL1/BCL2, and BFL1/MCL1 signatures that identify key anti-apoptotic proteins. Reference
Rejuvenation of the aged brain immune cell landscape in mice through p16-positive senescent cell clearance
Cellular senescence is a plausible mediator of inflammation-related tissue dysfunction. In the aged brain, senescent cell identities and the mechanisms by which they exert adverse influence are unclear.
Here we used high-dimensional molecular profiling, coupled with mechanistic experiments, to study the properties of senescent cells in the aged mouse brain. We show that senescence and inflammatory expression profiles increase with age and are brain region- and sex-specific. p16-positive myeloid cells exhibiting senescent and disease-associated activation signatures, including upregulation of chemoattractant factors, accumulate in the aged mouse brain. Senescent brain myeloid cells promote peripheral immune cell chemotaxis in vitro.
Activated resident and infiltrating immune cells increase in the aged brain and are partially restored to youthful levels through p16-positive senescent cell clearance in female p16-InkAttac mice, which is associated with preservation of cognitive function. Our study reveals dynamic remodeling of the brain immune cell landscape in aging and suggests senescent cell targeting as a strategy to counter inflammatory changes and cognitive decline. Reference
Risk stratification is critical for the early identification of high-risk individuals and disease prevention.
Here we explored the potential of nuclear magnetic resonance (NMR) spectroscopy-derived metabolomic profiles to inform on multidisease risk beyond conventional clinical predictors for the onset of 24 common conditions, including metabolic, vascular, respiratory, musculoskeletal and neurological diseases and cancers. Specifically, we trained a neural network to learn disease-specific metabolomic states from 168 circulating metabolic markers measured in 117,981 participants with ~1.4 million person-years of follow-up from the UK Biobank and validated the model in four independent cohorts. We found metabolomic states to be associated with incident event rates in all the investigated conditions, except breast cancer.
For 10-year outcome prediction for 15 endpoints, with and without established metabolic contribution, a combination of age and sex and the metabolomic state equaled or outperformed established predictors. Moreover, metabolomic state added predictive information over comprehensive clinical variables for eight common diseases, including type 2 diabetes, dementia and heart failure. Reference
Repeated turnovers keep sex chromosomes young in willows
Salicaceae species have diverse sex determination systems and frequent sex chromosome turnovers. However, compared with poplars, the diversity of sex determination in willows is poorly understood, and little is known about the evolutionary forces driving their turnover.
Here, we characterized the sex determination in two Salix species, S. chaenomeloides and S. arbutifolia, which have an XY system on chromosome 7 and 15, respectively. Based on the assemblies of their sex determination regions, we found that the sex determination mechanism of willows may have underlying similarities with poplars, both involving intact and/or partial homologs of a type A cytokinin response regulator (RR) gene.
Comparative analyses suggested that at least two sex turnover events have occurred in Salix, one preserving the ancestral pattern of male heterogamety, and the other changing heterogametic sex from XY to ZW, which could be partly explained by the “deleterious mutation load” and “sexually antagonistic selection” theoretical models. Reference
A cellular hierarchy in melanoma uncouples growth and metastasis
Although melanoma is notorious for its high degree of heterogeneity and plasticity, the origin and magnitude of cell-state diversity remains poorly understood. Equally, it is unclear whether growth and metastatic dissemination are supported by overlapping or distinct melanoma subpopulations.
Here, by combining mouse genetics, single-cell and spatial transcriptomics, lineage tracing and quantitative modelling, we provide evidence of a hierarchical model of tumour growth that mirrors the cellular and molecular logic underlying the cell-fate specification and differentiation of the embryonic neural crest.
We show that tumorigenic competence is associated with a spatially localized perivascular niche, a phenotype acquired through an intercellular communication pathway established by endothelial cells. Reference
Robust deep learning–based protein sequence design using ProteinMPNN
While deep learning has revolutionized protein structure prediction, almost all experimentally characterized de novo protein designs have been generated using physically based approaches such as Rosetta.
Here we describe a deep learning–based protein sequence design method, ProteinMPNN, with outstanding performance in both in silico and experimental tests. On native protein backbones, ProteinMPNN has a sequence recovery of 52.4%, compared to 32.9% for Rosetta. The amino acid sequence at different positions can be coupled between single or multiple chains, enabling application to a wide range of current protein design challenges.
We demonstrate the broad utility and high accuracy of ProteinMPNN using X-ray crystallography, cryoEM and functional studies by rescuing previously failed designs, made using Rosetta or AlphaFold, of protein monomers, cyclic homo-oligomers, tetrahedral nanoparticles, and target binding proteins. Reference
Evolutionary inference across eukaryotes identifies universal features shaping organelle gene retention
Mitochondria and plastids power complex life. Why some genes and not others are retained in their organelle DNA (oDNA) genomes remains a debated question. Here, we attempt to identify the properties of genes and associated underlying mechanisms that determine oDNA retention.
We harness over 15k oDNA sequences and over 300 whole genome sequences across eukaryotes with tools from structural biology, bioinformatics, machine learning, and Bayesian model selection. Previously hypothesized features, including the hydrophobicity of a protein product, and less well-known features, including binding energy centrality within a protein complex, predict oDNA retention across eukaryotes, with additional influences of nucleic acid and amino acid biochemistry.
Notably, the same features predict retention in both organelles, and retention models learned from one organelle type quantitatively predict retention in the other, supporting the universality of these features—which also distinguish gene profiles in more recent, independent endosymbiotic relationships. Reference
Analyses of rare predisposing variants of lung cancer in 6,004 whole genomes in Chinese
New study published in Cancer Cell present the largest whole-genome sequencing (WGS) study of non-small cell lung cancer (NSCLC) to date among 6,004 individuals of Chinese ancestry, coupled with 23,049 individuals genotyped by SNP array.
They construct a high-quality haplotype reference panel for imputation and identify 20 common and low-frequency loci (minor allele frequency [MAF] ≥ 0.5%), including five loci that have never been reported before. For rare loss-of-function (LoF) variants (MAF < 0.5%), they identify BRCA2 and 18 other cancer predisposition genes that affect 5.29% of individuals with NSCLC, and 98.91% (181 of 183) of LoF variants have not been linked previously to NSCLC risk.
Promoter variants of BRCA2 also have a substantial effect on NSCLC risk, and their prevalence is comparable with BRCA2 LoF variants. The associations are validated in an independent case-control study including 4,410 individuals and a prospective cohort study including 23,826 individuals. Their findings not only provide a high-quality reference panel for future array-based association studies but depict the whole picture of rare pathogenic variants for NSCLC. Reference
Pangenomic analysis of Chinese gastric cancer
Pangenomic study might improve the completeness of human reference genome (GRCh38) and promote precision medicine.
Here, we use an automated pipeline of human pangenomic analysis to build gastric cancer pan-genome for 185 paired deep sequencing data (370 samples), and characterize the gene presence-absence variations (PAVs) at whole genome level. Genes ACOT1, GSTM1, SIGLEC14 and UGT2B17 are identified as highly absent genes in gastric cancer population. A set of genes from unaligned sequences with GRCh38 are predicted.
We successfully locate one of predicted genes GC0643 on chromosome 9q34.2. Overexpression of GC0643 significantly inhibits cell growth, cell migration and invasion, cell cycle progression, and induces cell apoptosis in cancer cells. The tumor suppressor functions can be reversed by shGC0643 knockdown. The GC0643 is approved by NCBI database (GenBank: MW194843.1). Collectively, the robust pan-genome strategy provides a deeper understanding of the gene PAVs in the human cancer genome. Reference
A novel molecular signature identifies mixed subtypes in renal cell carcinoma
Renal cell carcinoma (RCC) is a heterogeneous disease comprising histologically defined subtypes. For therapy selection, precise subtype identification and individualized prognosis are mandatory, but currently limited.
Aim was to refine subtyping and outcome prediction across main subtypes, assuming that a tumor is composed of molecular features present in distinct pathological subtypes. Novel classification approach into unambiguous and intermediate subtypes opens new avenue for patient stratification and treatment selection for innovative immunotherapies.
Individual RCC samples were modeled as linear combination of the main subtypes (clear cell (ccRCC), papillary (pRCC), chromophobe (chRCC)) using computational gene expression deconvolution. The new molecular subtyping was compared with histological classification of RCC using the Cancer Genome Atlas (TCGA) cohort (n = 864; ccRCC: 512; pRCC: 287; chRCC: 65) as well as 92 independent histopathologically well-characterized RCC. Reference
Linking transcriptomes with morphological and functional phenotypes in mammalian retinal ganglion cells
Retinal ganglion cells (RGCs) are the brain’s gateway to the visual world. They can be classified into different types on the basis of their electrophysiological, transcriptomic, or morphological characteristics.
Here, we characterize the transcriptomic, morphological, and functional features of 472 high-quality RGCs using Patch sequencing (Patch-seq), providing functional and morphological annotation of many transcriptomic-defined cell types of a previously established RGC atlas. We show a convergence of different modalities in defining the RGC identity and reveal the degree of correspondence for well-characterized cell types across multimodal data.
Moreover, we complement some RGC types with detailed morphological and functional properties. We also identify differentially expressed genes among ON, OFF, and ON-OFF RGCs such as Vat1l, Slitrk6, and Lmo7, providing candidate marker genes for functional studies. Our research suggests that the molecularly distinct clusters may also differ in their roles of encoding visual information. Reference
Genetics of physical activity and sedentary behavior in disease prevention
Although physical activity and sedentary behavior are moderately heritable, little is known about the mechanisms that influence these traits. Combining data for up to 703,901 individuals from 51 studies in a multi-ancestry meta-analysis of genome-wide association studies yields 99 loci that associate with self-reported moderate-to-vigorous intensity physical activity during leisure time (MVPA), leisure screen time (LST) and/or sedentary behavior at work.
Loci associated with LST are enriched for genes whose expression in skeletal muscle is altered by resistance training. A missense variant in ACTN3 makes the alpha-actinin-3 filaments more flexible, resulting in lower maximal force in isolated type IIA muscle fibers, and possibly protection from exercise-induced muscle damage.
Finally, Mendelian randomization analyses show that beneficial effects of lower LST and higher MVPA on several risk factors and diseases are mediated or confounded by body mass index (BMI). Our results provide insights into physical activity mechanisms and its role in disease prevention. Reference
Design, construction, and in vivo augmentation of a complex gut microbiome
Efforts to model the human gut microbiome in mice have led to important insights into the mechanisms of host-microbe interactions. However, the model communities studied to date have been defined or complex, but not both, limiting their utility.
Here, we construct and characterize in vitro a defined community of 104 bacterial species composed of the most common taxa from the human gut microbiota (hCom1). We then used an iterative experimental process to fill open niches: germ-free mice were colonized with hCom1 and then challenged with a human fecal sample.
We identified new species that engrafted following fecal challenge and added them to hCom1, yielding hCom2. In gnotobiotic mice, hCom2 exhibited increased stability to fecal challenge and robust colonization resistance against pathogenic Escherichia coli. Mice colonized by either hCom2 or a human fecal community are phenotypically similar, suggesting that this consortium will enable a mechanistic interrogation of species and genes on microbiome-associated phenotypes. Reference
Regulation associated modules reflect 3D genome modularity associated with chromatin activity
The 3D genome has been shown to be organized into modules including topologically associating domains (TADs) and compartments that are primarily defined by spatial contacts from Hi-C.
There exists a gap to investigate whether and how the spatial modularity of the chromatin is related to the functional modularity resulting from chromatin activity. Despite histone modifications reflecting chromatin activity, inferring spatial modularity of the genome directly from the histone modification patterns has not been well explored.
Here, we report that histone modifications show a modular pattern (referred to as regulation associated modules, RAMs) that reflects spatial chromatin modularity. Enhancer-promoter interactions, loop anchors, super-enhancer clusters and extrachromosomal DNAs (ecDNAs) are found to occur more often within the same RAMs than within the same TADs. Reference
Identification of trypsin-degrading commensals in the large intestine
Increased levels of proteases, such as trypsin, in the distal intestine have been implicated in intestinal pathological conditions. However, the players and mechanisms that underlie protease regulation in the intestinal lumen have remained unclear.
Here we show that Paraprevotella strains isolated from the faecal microbiome of healthy human donors are potent trypsin-degrading commensals. Mechanistically, Paraprevotella recruit trypsin to the bacterial surface through type IX secretion system-dependent polysaccharide-anchoring proteins to promote trypsin autolysis. Paraprevotella colonization protects IgA from trypsin degradation and enhances the effectiveness of oral vaccines against Citrobacter rodentium. Reference
Abnormal molecular signatures of inflammation and energy metabolism in Huntington disease
A major challenge in neurodegenerative diseases concerns identifying biological disease signatures that track with disease progression or respond to an intervention. Several clinical trials in Huntington disease (HD), an inherited, progressive neurodegenerative disease, are currently ongoing.
Therefore, we examine whether peripheral tissues can serve as a source of readily accessible biological signatures at the RNA and protein level in HD patients. We generate large, high-quality human datasets from skeletal muscle, skin and adipose tissue to probe molecular changes in human premanifest and early manifest HD patients—those most likely involved in clinical trials. The analysis of the transcriptomics and proteomics data shows robust, stage-dependent dysregulation. Gene ontology analysis confirms the involvement of inflammation and energy metabolism in peripheral HD pathogenesis.
Our ‘omics data document the involvement of inflammation, energy metabolism, and extracellular vesicle homeostasis. This demonstrates the potential to identify biological signatures from peripheral tissues in HD suitable as biomarkers in clinical trials. Reference
Deep Mutational Learning Predicts ACE2 Binding and Antibody Escape
The continual evolution of the SARS-CoV-2 and the emergence of variants that show resistance to vaccines and neutralizing antibodies threaten to prolong the COVID-19 pandemic.
Selection and emergence of SARS-CoV-2 variants are driven in part by mutations within the viral spike protein and in particular the ACE2 receptor-binding domain (RBD), a primary target site for neutralizing antibodies. Here, we develop deep mutational learning (DML), a machine learning-guided protein engineering technology, which is used to interrogate a massive sequence space of combinatorial mutations, representing billions of RBD variants, by accurately predicting their impact on ACE2 binding and antibody escape.
A highly diverse landscape of possible SARS-CoV-2 variants is identified that could emerge from a multitude of evolutionary trajectories. DML may be used for predictive profiling on current and prospective variants, including highly mutated variants such as Omicron, thus guiding the development of therapeutic antibody treatments and vaccines for COVID-19. Reference
Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data
Single-cell RNA sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce single-cell disease relevance score (scDRS), an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWASs).
We applied scDRS to 74 diseases/traits and 1.3 million single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type–disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; and hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels.
Genes whose expression was correlated with the scDRS score across cells (reflecting coexpression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes. Reference
African-specific molecular taxonomy of prostate cancer
Prostate cancer is characterized by considerable geo-ethnic disparity. African ancestry is a significant risk factor, with mortality rates across sub-Saharan Africa of 2.7-fold higher than global averages. The contributing genetic and non-genetic factors, and associated mutational processes, are unknown.
Here, through whole-genome sequencing of treatment-naive prostate cancer samples from 183 ancestrally (African versus European) and globally distinct patients, we generate a large cancer genomics resource for sub-Saharan Africa, identifying around 2 million somatic variants. Significant African-ancestry-specific findings include an elevated tumour mutational burden, increased percentage of genome alteration, a greater number of predicted damaging mutations and a higher total of mutational signatures, and the driver genes NCOA2, STK19, DDX11L1, PCAT1 and SETBP1.
In addition to the clinical benefit of including individuals of African ancestry, our GMS subtypes reveal different evolutionary trajectories and mutational processes suggesting that both common genetic and environmental factors contribute to the disparity between ethnicities. Analogous to gene–environment interaction—defined here as a different effect of an environmental surrounding in people with different ancestries or vice versa—we anticipate that GMS subtypes act as a proxy for intrinsic and extrinsic mutational processes in cancers, promoting global inclusion in landmark studies. Reference
How human CD8+ T cell heterogeneity and transcriptomes change over nine decades of life?
The decline of CD8+ T cell functions contributes to deteriorating health with aging, but the mechanisms that underlie this phenomenon are not well understood.
We use single-cell RNA sequencing with both cross-sectional and longitudinal samples to assess how human CD8+ T cell heterogeneity and transcriptomes change over nine decades of life. Eleven subpopulations of CD8+ T cells and their dynamic changes with age are identified. Age-related changes in gene expression result from changes in the percentage of cells expressing a given transcript, quantitative changes in the transcript level, or a combination of these two.
We develop a machine learning model capable of predicting the age of individual cells based on their transcriptomic features, which are closely associated with their differentiation and mutation burden. Finally, we validate this model in two separate contexts of CD8+ T cell aging: HIV infection and CAR T cell expansion in vivo. Reference
An extensive resource for Bioinformatics, Epigenomics, Genomics and Metagenomics