Tag Archives: STAR ReadsPerGene

How to merge STAR output reads Counts ReadsPerGene.out.tab?

Merging STAR output reads Counts “ReadsPerGene.out.tab”

How to merge STAR output reads Counts “ReadsPerGene.out.tab”?

The ReadsPerGene.out.tab output files of STAR (from option –quantMode GeneCounts) contain 4 columns that correspond to different counts per gene calculated according to the protocol’s strandedness

column 1: gene ID

column 2: counts for unstranded RNA-seq.

column 3: counts for the 1st read strand aligned with RNA

column 4: counts for the 2nd read strand aligned with RNA

Run following script to merge all ReadsPerGene.out.tab files:

files = list.files(paste0(‘/AllRNAseqdata/test/’), “*ReadsPerGene.out.tab$”, full.names = T)

countData = data.frame(fread(files[1]))[c(1,4)]

for(i in 2:length(files)) {
countData = cbind(countData, data.frame(fread(files[i]))[4])
###Skip first 4 lines, count data starts on the 5th line

countData = countData[c(5:nrow(countData)),]

colnames(countData) = c(“GeneID”, gsub(paste0(‘/AllRNAseqdata/test//’), “”, files))

colnames(countData) = gsub(“ReadsPerGene.out.tab”, “”, colnames(countData))
rownames(countData) = countData$GeneID
countData = countData[,c(2:ncol(countData))]
write.table(countData, file=”counts.txt”, quote=F,sep=’\t’,row.names=T, col.names=T)