且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 R 中加载 FASTA 文件比使用 seqinr 中的 read.fasta() 更快

更新时间:2022-04-04 09:00:12

您可以使用 Biostrings 中的 readDNAStringSet.

You can use readDNAStringSet from Biostrings.

获取人类基因组:download.file("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz","../Downloads/test.fa.gz")

Get the human genome: download.file("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz","../Downloads/test.fa.gz")

使用 readDNAStringSetread.fasta :

f1 = function(){readDNAStringSet("../Downloads/test.fa.gz")}
f2 = function(){read.fasta("../Downloads/test.fa.gz")}

library(Biostrings)
library(seqinr)

microbenchmark::microbenchmark(f1(),times=5)
Unit: seconds
 expr      min       lq     mean   median       uq      max neval
 f1() 42.82203 43.57036 45.10369 45.64206 46.37412 47.10987     5

microbenchmark::microbenchmark(f1(),times=5)
### did not finish running
### so definitely not the option for large fasta files