且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

将大数据集加载到R中的最快方法和最快格式是什么

更新时间:2023-11-28 14:01:28

这取决于您打算如何处理数据。如果您希望将整个数据存储在内存中以进行某些操作,那么我想***的选择是fread或readRDS(如果对您而言重要,RDS中保存的数据的文件大小要小得多)。

It depends on what you plan on doing with the data. If you want the entire data in memory for some operation then I guess your best bet is fread or readRDS (the file size for a data saved in RDS is much much smaller if that matters to you).

如果您要对数据进行汇总操作,我发现一次转换为数据库(使用sqldf)是一个更好的选择,因为通过对数据执行sql查询,后续操作会更快,但这也是因为我没有足够的RAM来在内存中加载13 GB的文件。

If you will be doing summary operations on the data I have found one time conversion to a database (using sqldf) a much better option, as subsequent operations are much more faster by executing sql queries on the data, but that is also because I don't have enough RAM to load 13 GB files in memory.