Merging STAR output reads Counts “ReadsPerGene.out.tab”
How to merge STAR output reads Counts “ReadsPerGene.out.tab”?
The ReadsPerGene.out.tab output files of STAR (from option –quantMode GeneCounts) contain 4 columns that correspond to different counts per gene calculated according to the protocol’s strandedness
column 1: gene ID
column 2: counts for unstranded RNA-seq.
column 3: counts for the 1st read strand aligned with RNA
column 4: counts for the 2nd read strand aligned with RNA
Run following script to merge all ReadsPerGene.out.tab files:
files = list.files(paste0(‘/AllRNAseqdata/test/’), “*ReadsPerGene.out.tab$”, full.names = T)
countData = data.frame(fread(files[1]))[c(1,4)]
for(i in 2:length(files)) {
countData = cbind(countData, data.frame(fread(files[i]))[4])
}
###Skip first 4 lines, count data starts on the 5th line
countData = countData[c(5:nrow(countData)),]
colnames(countData) = c(“GeneID”, gsub(paste0(‘/AllRNAseqdata/test//’), “”, files))
colnames(countData)
colnames(countData) = gsub(“ReadsPerGene.out.tab”, “”, colnames(countData))
rownames(countData) = countData$GeneID
countData = countData[,c(2:ncol(countData))]
countData[1:10,1:5]
write.table(countData, file=”counts.txt”, quote=F,sep=’\t’,row.names=T, col.names=T)