Bioinformatics and Genomics related notes, practical tips and tricks:
July, 2022
A Brief Guide to Genomics
Genomics is the study of all of a person's genes (the genome), including interactions of those genes with each other and with the person's environment.
What is DNA?
What is a genome?
What is DNA sequencing?
What is the Human Genome Project?
What are the implications for medical science?
bioboxes: a standard for creating interchangable bioinformatics software containers
Bioboxes simplify getting and using bioinformatics software. This short guide illustrates this using an example scenario where you would like to assemble some Illumina reads into contigs. This is a common situation for anyone who works in genomics. The purpose of this guide is to illustrate how bioboxes work and this could then be applied for any application for which a biobox exists, not only genome assembly. Source Link
RPKM, FPKM and TPM, clearly explained
RPKM, FPKM and TPM, clearly explained
ENCODE:Tutorials and Presentations
ENCODE 2016: Research Applications and Users Meeting
ENCODE 2015: Research Applications and Users Meeting
Merging Gene Expression Data: inSilicoMerging package
Computational tools for DNA methylation
Computational tools for DNA methylation
Baseline gene expression datasets
Bgee: Gene Expression Evolution
Visualizing Chip-Seq Data Using Ucsc
Visualizing Chip-Seq Data Using Ucsc
Complete Listing of All Pathguide Resources
Complete Listing of All Pathguide Resources
Analysis of High-Throughput Sequencing Data
Analysis of High-Throughput Sequencing Data
GSEA in R
How to perform Kolmogorov-Smirnov statistic in GSEA in R?
Cloud Genomics
List Of Cloud Genomics Companies
Teaching as a skill and a career
Developing teaching skills and experience
Introduction to Gene Set Enrichment Analysis (GSEA)
Gene Set Enrichment Analysis (GSEA)
GSEA software and source code and the Molecular Signatures Database (MSigDB)
Important research papers related to Oncogenomics
Oncogenomics and the development of new cancer therapies
Databases and Web Tools for Cancer Genomics Study
Cancer genomics: from discovery science to personalized medicine
Informatics for RNA-seq: A web resource for analysis on the cloud
Informatics for RNA Sequencing: A Web Resource for Analysis on the Cloud
Sequences, Genomes, and Genes in R / Bioconductor
Introduction to Survival Analysis in Genomics
Useful links:
Survival analysis
Survival analysis of TCGA patients integrating gene expression
Survival Analysis with Plotly: R vs Python
Review of Survival Analysis Techniques
Descriptive Methods for Survival Data
Introduction to Survival Analysis
Background for Survival Analysis
Freely available packages for Infinium 450k methylation data analysis:
ChAMP: Comprehensive suite of functions; automated pipeline
COHCAP: CpG island analysis and gene expression data integration
Epigenetic clock: Predictor of sample age
EWasher: Reference-free cell composition correction
FastDMA: Quantile normalisation and DMP/DMR calling
IMA: Preprocessing including normalisation methods; Pipeline option
Lumi: Background correction, general normalisation
Marmal-aid: 450k database for data integration
MethylAid: Interface for interactive sample QC
Methylum: Comprehensive suite of functions
Minfi: Comprehensive suite of functions
NIMBL: Matlab code for QC and DMP calling
RefFreeEWAS: Reference-free cell composition correction
RnBeads: Comprehensive suite of functions
shinyMethyl: Interface for interactive sample QC
wateRmelon: Preprocessing including performance metrics and numerous normalisation methods
Useful links from nature publication:
Article series on Single-cell omics
Web Collection on Clinical applications of next-generation sequencing
Article series on Computational tools
ArrayExpress--a public repository for microarray gene expression data at the EBI
ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community. ArrayExpress Access the ArrayExpress Microarray Database at EBI and build Bioconductor data structures
Essential elements of personalized medicine Personalized medicine and education: the challenge Pharmacogenetics, Pharmacogenomics and Ayurgenomics for Personalized Medicine: A Paradigm Shift Role of genomics on the path to personalized medicine. Personalized medicine: new genomics, old lessons Genomics And Personalized Medicine: Is It Really Different This Time? Cancer genomics just got personal Cancer genomics: from discovery science to personalized medicine Personalized cancer medicine and the future of pathology. Genomic and Personalized Medicine cancer genomics and personalized medicine How Personalized Medicine is Changing: Lung Cancer The future of personalized medicine Wikipedia: Personalized medicine Wikipedia: Education in personalized medicine
The term radiogenomics is used in two contexts: either to refer to the study of genetic variation associated with response to radiation (Radiation Genomics) or to refer to the correlation between cancer imaging features and gene expression (Imaging Genomics). Wikipedia link Radiogenomics Consortium (RGC) Establishment of a Radiogenomics Consortium Related PubMed articles for basic Introduction: The future has begun in radiogenomics! Radiogenomic imaging-linking diagnostic imaging and molecular diagnostics Perspectives in Implementing Radiogenomics to Radiotherapy Radiogenomics: Radiobiology Enters the Era of Big Data and Team Science The current progress and future prospects of personalized radiogenomic cancer study Radiogenomics: the search for genetic predictors of radiotherapy response. Related YOUTUBE Videos YOUTUBE: Radiogenomic evaluation of tumour response to targeted agents YOUTUBE: Radiogenomics consortium YOUTUBE: Decoding Breast Cancer with Quantitative Radiomics & Radiogenomics - Maryellen Giger YOUTUBE: Radiogenomic Analysis of TCGA/TCIA Diffuse Lower Grade Gliomas.. - Laila Poisson The BioMart project provides free software and data services to the international scientific community in order to foster scientific collaboration and facilitate the scientific discovery process. The project adheres to the open source philosophy that promotes collaboration and code reuse. BioMart Quantitative data: learning to share BiomaRt or how to access the Ensembl data from R Interface to BioMart databases Some problem solving: BioStars links: problems updating biomaRt in R version 3.0 Is The Biomart Registry Accessed Via The Bioconductor Package Out Of Date? Missing ensembl_ids in biomaRt uniprot query Missing gene symbols in biomart How To Ignore Species In Ensembl Biomart How to distinguish protein isoforms using biomaRt? biomaRt code giving me trouble Annotation of exon array on probeset id and transcriptclusterids using biomaRT
Genome-wide association (GWA) studies have typically focused on the analysis of single markers, which often lacks the power to uncover the relatively small effect sizes conferred by most genetic variants (wang el al Nat Rev Genet 2010). Further reading: Mol Genet Metab. 2010, Plos Comput Biol. 2012;8(2), Molecular Genetics and Metabolism (2010): 134-40, Trends Genet. 2012 Jul;28(7):323-32. etc.
The shell provides you with an interface to the UNIX system. It gathers input from you and executes programs based on that input. When a program finishes executing, it displays that program's output. The basic concept of a shell script is a list of commands, which are listed in the order of execution. What is Shells? Linux Shell Scripting Tutorial Writing a Shell Script From Scratch UNIX & Linux Shell Scripting Tutorial Understand Linux Shell and Basic Shell Script Biostars Reference for NGS data: bash loop for alignment RNA-seq data Others links: How can I use a pipe or redirect in a qsub command? Sam to Bam using bowtie and using the shell script Separating list of input files
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons). From Wikipedia Codon usage: Nature's roadmap to expression and folding of proteins Codon Usage Bias Database (CUB-DB) and Explorer GCUA: General Codon Usage Analysis GenScript Codon Usage Frequence Table Tool Codon Optimization : other info ResearchGate: Codon Optimization ResearchGate: How to measure codon usage bias? What is the widely-used method? ResearchGate: How do I analyze codon usage between yeast and bacteria? ResearchGate: How to optimize codon usage? ResearchGate: Measuring codon usage bias ResearchGate: How gene codon optimization works?
Bioinformaticsweb.net tutorial My Bio : Tutorials in bioinformatics Martin Vingron's superb online bioinformatics tutorial A New Online Computational Biology Curriculum Data intensive biology for everyone An Online Bioinformatics Curriculum
CRISPRs (clustered regularly interspaced short palindromic repeats) are segments of prokaryotic DNA containing short repetitions of base sequences. Each repetition is followed by short segments of spacer DNA from previous exposures to a bacterial virus or plasmid. From Wikipedia CRISPR Genome Engineering Resources CRISPR: A game-changing genetic engineering technique CRISPI : a CRISPR Interactive database A CRISPR Way To Fix Faulty Genes
Circos is a software package for visualizing data and information. It visualizes data in a circular layout - this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive. Circular Visualization in R Package source: circlize
MongoDB (from humongous) is a cross-platform document-oriented database. Classified as a NoSQL database, MongoDB eschews the traditional table-based relational database structure in favor of JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. Released under a combination of the GNU Affero General Public License and the Apache License, MongoDB is free and open-source software. From Wikipedia MongoDB For Beginners: Introduction And Installation (Part 1/3) Beginners' guide to using MongoDB All out beginner's guide to MongoDB
Specific plot library R / ggplot2 R blogger, ggplot2 tag R blogger Lattice tag Useful links ----------------------------------------------------------------------------------------
Metagenomics is defined as the study of the metagenome, which is total genomic DNA from environmental samples. software Metasim(Simulator-used to compare predictions) Gene calling MetaGeneMark Binning Sequence similiarity based binning Functional Annotation RAMMCAP(Rapid analysis of Multiple Metagenomes with Clustering and Annotation Pipeline) Comparitive Metagenomics Mapping to reference genome SOAPZ ---------------------------------------------------------------------------------------- Epigenomics is the study of the complete set of epigenetic modifications on the genetic material of a cell, known as the epigenome. The field is analogous to genomics and proteomics, which are the study of the genome and proteome of a cell (Russell 2010 p. 217 & 230). Epigenetic modifications are reversible modifications on a cell's DNA or histones that affect gene expression without altering the DNA sequence (Russell 2010 p. 475). Ref. source Roadmap Epigenomics Ref. source Roadmap Epigenomics Project: Publications Ref. source What is the epigenome? Cancer Epigenetics ---------------------------------------------------------------------------------------- Learning RStudio for R Statistical Computing Ref. source ---------------------------------------------------------------------------------------- Survival Analysis with Plotly: R vs. Python ---------------------------------------------------------------------------------------- TCGAs Methylation Data Annotation R is coming to SQL Server. SQL Server 2016 (which will be in public preview this summer) will include new real-time analytics, automatic data encryption, and the ability to run R within the database itself:Ref. source Useful R tutorial R basics tutorials, data visualization, plots, charts etc. Ref. source ------------------------------------------------------------------------------------------
10 Amazing and Mysterious Uses of (!) Symbol or Operator in Linux Commands This symbol or operator in Linux can be used as Logical Negation operator as well as to fetch commands from history with tweaks or to run previously run command with modification.Ref. source GEO dataset processing GEOquery to access GEO datasets: Ref. source Get an idea of a gene expression value across samples by GEOquery: Ref. source How to analyze the gene upregulation and downregulation using microarray GEO data? Ref. source Microarray processed/normalized data from GEO: Ref. source Useful list of R packages Data import/access: readr (text data files), readxl (Excel spreadsheets) and RMySQL (MySQL databases) 9 popular ways to perform Data Visualization in Python There are multiple tools for performing visualization in data science. Ref. source |
||||
Monday, 18 May, 2015
Integration of transcriptome and binding data (chipseq) The combination of ChIP-seq and transcriptome analysis is a compelling approach to unravel the regulation of gene expression. Some tools
Target analysis by integration of transcriptome and ChIP-seq data with BETA. 2. ChIP-Array : webserver 3. EMBER :
Discovering transcription factor regulatory targets using gene expression and binding data. Unix & Perl Primer for Biologists ![]() Using Awk to join two files based on several columns
12 Best Free Ebooks for Machine Learning ![]() Run Linux from USB: ![]() Porteus, Puppy Linux, Crunchbang, Tails, Arch, Ubuntu etc Useful link , link How to download genomics data using aspera ascp is a command-line fasp transfer program.
|