STAR ReadsPerGene | BeGenomics

Merging STAR output reads Counts “ReadsPerGene.out.tab”

How to merge STAR output reads Counts “ReadsPerGene.out.tab”?

The ReadsPerGene.out.tab output files of STAR (from option –quantMode GeneCounts) contain 4 columns that correspond to different counts per gene calculated according to the protocol’s strandedness

column 1: gene ID

column 2: counts for unstranded RNA-seq.

column 3: counts for the 1st read strand aligned with RNA

column 4: counts for the 2nd read strand aligned with RNA

Run following script to merge all ReadsPerGene.out.tab files:
files = list.files(paste0(‘/AllRNAseqdata/test/’), “*ReadsPerGene.out.tab$”, full.names = T)

countData = data.frame(fread(files[1]))[c(1,4)]

for(i in 2:length(files)) {
countData = cbind(countData, data.frame(fread(files[i]))[4])
}
###Skip first 4 lines, count data starts on the 5th line

countData = countData[c(5:nrow(countData)),]

colnames(countData) = c(“GeneID”, gsub(paste0(‘/AllRNAseqdata/test//’), “”, files))

colnames(countData)
colnames(countData) = gsub(“ReadsPerGene.out.tab”, “”, colnames(countData))
rownames(countData) = countData$GeneID
countData = countData[,c(2:ncol(countData))]
countData[1:10,1:5]
write.table(countData, file=”counts.txt”, quote=F,sep=’\t’,row.names=T, col.names=T)

BeGenomics

Tag Archives: STAR ReadsPerGene

How to merge STAR output reads Counts ReadsPerGene.out.tab?

An extensive resource for Bioinformatics, Epigenomics, Genomics and Metagenomics