Bioinformatics and Genomics related notes, practical tips and tricks:

April 2023

How to merge STAR output reads Counts “ReadsPerGene.out.tab”?

The ReadsPerGene.out.tab output files of STAR (from option –quantMode GeneCounts) contain 4 columns that correspond to different counts per gene calculated according to the protocol’s strandedness

column 1: gene ID

column 2: counts for unstranded RNA-seq.

column 3: counts for the 1st read strand aligned with RNA

column 4: counts for the 2nd read strand aligned with RNA

Run following script to merge all ReadsPerGene.out.tab files:

files = list.files(paste0(‘/AllRNAseqdata/test/’), “*ReadsPerGene.out.tab$”, full.names = T)

countData = data.frame(fread(files[1]))[c(1,4)]

for(i in 2:length(files)) {
countData = cbind(countData, data.frame(fread(files[i]))[4])
###Skip first 4 lines, count data starts on the 5th line

countData = countData[c(5:nrow(countData)),]

colnames(countData) = c(“GeneID”, gsub(paste0(‘/AllRNAseqdata/test//’), “”, files))

colnames(countData) = gsub(“ReadsPerGene.out.tab”, “”, colnames(countData))
rownames(countData) = countData$GeneID
countData = countData[,c(2:ncol(countData))]
write.table(countData, file=”counts.txt”, quote=F,sep=’\t’,row.names=T, col.names=T)

For details click here

How to merge /average gene expression of RNA-Seq replicates ?

There are two functions in edgeR to average expression of biological replicates “cpmByGroup” or “rpkmByGroup” and use “sumTechReps” for technical replicates.

How to convert microarray probe id to gene ids?

rmap = revmap(hgu133aSYMBOL)
syms = c(“ATF4”, “BCL2”, “CDK2”,)
mget(syms, rmap)
If you have Illumina probe expression id then use illuminaHumanv4
probeID=c(“ILMN_1343291”, “ILMN_1651209”)
data.frame(Gene=unlist(mget(x = probeID,envir = illuminaHumanv4SYMBOL)))

An extensive resource for Bioinformatics, Epigenomics, Genomics and Metagenomics