且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用 Apache POI 加载大型 xlsx 文件?

更新时间:2022-01-10 17:41:11

我在 Web 服务器环境中遇到了类似的情况.上传的典型大小约为 150k 行,从单个请求中消耗大量内存不会很好.Apache POI Streaming API 对此非常有效,但它需要对读取逻辑进行全面重新设计.我已经有一堆使用标准 API 的读取逻辑,我不想重做,所以我写了这个:https://github.com/monitorjbl/excel-streaming-reader

I was in a similar situation with a webserver environment. The typical size of the uploads were ~150k rows and it wouldn't have been good to consume a ton of memory from a single request. The Apache POI Streaming API works well for this, but it requires a total redesign of your read logic. I already had a bunch of read logic using the standard API that I didn't want to have to redo, so I wrote this instead: https://github.com/monitorjbl/excel-streaming-reader

它并不完全是标准 XSSFWorkbook 类的替代品,但如果您只是遍历行,它的行为类似:

It's not entirely a drop-in replacement for the standard XSSFWorkbook class, but if you're just iterating through rows it behaves similarly:

import com.monitorjbl.xlsx.StreamingReader;

InputStream is = new FileInputStream(new File("/path/to/workbook.xlsx"));
StreamingReader reader = StreamingReader.builder()
        .rowCacheSize(100)    // number of rows to keep in memory (defaults to 10)
        .bufferSize(4096)     // buffer size to use when reading InputStream to file (defaults to 1024)
        .sheetIndex(0)        // index of sheet to use (defaults to 0)
        .read(is);            // InputStream or File for XLSX file (required)

for (Row r : reader) {
  for (Cell c : r) {
    System.out.println(c.getStringCellValue());
  }
}     

使用它有一些注意事项;由于 XLSX 表的结构方式,在流的当前窗口中并非所有数据都可用.但是,如果您只是想从单元格中读取简单的数据,那么它非常适用.

There are some caveats to using it; due to the way XLSX sheets are structured, not all data is available in the current window of the stream. However, if you're just trying to read simple data out from the cells, it works pretty well for that.