且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在嵌入式Linux中高效地在VFAT分区上创建大文件

更新时间:2023-11-27 23:17:40

为什么VFAT跳过写得太慢?



除非VFAT文件系统驱动程序在这方面被欺骗,否则在FAT类型的文件系统上创建大文件总是需要很长时间。为符合FAT规范,驱动程序将不得不分配所有数据块并对其进行零初始化,即使您跳过写入。这是因为群集链FAT。



这种行为的原因是FAT无法支持:


  • 文件中的UN * X风格的漏洞(又名稀疏文件)

    这就是你用testcase创建的ext3文件 - 数据块分配给它的第一个1GB-1MB,一个1MB的实际提交,零初始化块)。
  • NTFS样式的有效数据长度信息。
    在NTFS上,一个文件可以有分配给它的未初始化的块,但是文件的元数据将保留两个大小的字段 - 一个用于文件的总大小,另一个用于实际写入的字节数(从文件开始)。如果没有支持这两种技术的规范,文件系统将不得不分配和填充所有中间数据块,如果你跳过一个范围。另外请记住,在ext3上,您使用的技术实际上并没有为文件分配块(除了最后一个1MB以外)。如果你需要预先分配的块(不只是文件的大小设置很大),那么你也必须在那里执行完整的写入。

    VFAT驱动程序被修改来处理这个问题?



    目前,驱动程序使用Linux内核函数 cont_write_begin()甚至开始异步写入文件;这个函数看起来像:

      / * 
    *对于不允许在文件中出现空洞的恶意文件系统。
    *我们可能需要扩展文件。
    * /
    int cont_write_begin(struct file * file,struct address_space * mapping,
    loff_t pos,unsigned len,unsigned flags,$ b $ struct page ** pagep,void ** fsdata ,
    get_block_t * get_block,loff_t * bytes)
    {
    struct inode * inode = mapping-> host;
    unsigned blocksize = 1<< inode-&GT; i_blkbits;
    无符号的zerofrom;
    int err;

    err = cont_expand_zero(file,mapping,pos,bytes);
    if(err)
    return err;

    zerofrom = *字节& 〜PAGE_CACHE_MASK;
    if(pos + len> * bytes&& zerofrom&(blocksize-1)){
    * bytes | =(blocksize-1);
    (* bytes)++;


    return block_write_begin(mapping,pos,len,flags,pagep,get_block);





    $ p

    这是一个简单的策略,但也是一个pagecache trasher(你的日志消息是一个结果调用 cont_expand_zero()来完成所有的工作,而且不是异步的)。如果文件系统要分割这两个操作 - 一个是做真正的写操作,另一个是做零填充,看起来更快捷。

    在仍然使用默认的linux文件系统实用程序接口的情况下可以实现的方式是在内部创建两个虚拟文件 - 一个用于待填充区域,另一个用于实际要写入的数据。实际文件的目录项和FAT簇链只有在后台任务实际完成时才会被更新,将最后一个簇与第一个zerofill文件连接起来,最后一个簇与第一个实际写入文件。还有一种方法是直接写一个zerofilling,以避免破坏页面缓存。

    注意:尽管这在技术上是可行的,问题是做这样的改变是多么值得?谁一直需要这个操作?什么是副作用?

    现有的(简单的)代码对于较小的跳过写入是完全可以接受的,如果您创建一个1MB文件并在最后写入一个字节,您将不会真正注意到它的存在。它会咬你,只有当你按照FAT文件系统允许你做的限制的顺序进行文件化。



    其他选项...



    在某些情况下,手头的任务涉及两个(或更多)步骤:


    1. 新鲜地格式化(例如)具有FAT的SD卡
    2. 将一个或多个大文件放在其上以预先填充该卡

    3. - 依赖的,可选的)

      预先填充文件,或者

      把一个loopback文件系统映像放入它们里面

    我曾经处理过的其中一个案例,我们折叠了前两个,即修改 mkdosfs 来预先分配/预先创建文件(FAT32)文件系统。这很简单,当写入FAT表时,只需创建分配的簇链而不是使用free标记填充的簇。它还具有数据块保证连续的优点,以防您的应用程序从中受益。你可以决定使 mkdosfs 不清除以前的数据块内容。例如,如果你知道你的准备步骤之一就是写入整个数据,或者在FAT文件夹中进行文件传输(很常见的东西 - linux应用程序,用于与Windows应用程序/ gui进行数据交换的sd卡)。那么就没有必要将任何东西/双重写入(一次用零,一次用其他)。如果你的用例符合这个要求(即格式化该卡是初始化它的使用过程的一个有用/正常的步骤),然后尝试一下;经过适当修改的 mkdosfs mkdosfs.c

    搜索 -N 命令行选项处理

    在讨论预分配时,如前所述,还有 posix_fallocate() 。目前在使用FAT的Linux上,这与手册 dd ... 基本相同,即等待zerofill。但是函数的规范并不要求它是同步的。块分配(FAT簇链生成)将不得不同步完成,但VFAT磁盘上的dirent大小更新和数据块zerofills可能会后台/延迟(即在后台执行低优先级或只在明确通过 fdsync() / sync()来请求,以便应用程序可以例如分配块,归零...)。虽然这是技术/设计;我还没有意识到有人已经完成了内核修改,只是为了试验。


    I'm trying to create a large empty file on a VFAT partition by using the `dd' command in an embedded linux box:

    dd if=/dev/zero of=/mnt/flash/file bs=1M count=1 seek=1023
    

    The intention was to skip the first 1023 blocks and write only 1 block at the end of the file, which should be very quick on a native EXT3 partition, and it indeed is. However, this operation turned out to be quite slow on a VFAT partition, along with the following message:

    lowmem_shrink:: nr_to_scan=128, gfp_mask=d0, other_free=6971, min_adj=16
    // ... more `lowmem_shrink' messages
    

    Another attempt was to fopen() a file on the VFAT partition and then fseek() to the end to write the data, which has also proved slow, along with the same messages from the kernel.

    So basically, is there a quick way to create the file on the VFAT partition (without traversing the first 1023 blocks)?

    Thanks.

    Why are VFAT "skipping" writes so slow ?

    Unless the VFAT filesystem driver were made to "cheat" in this respect, creating large files on FAT-type filesystems will always take a long time. The driver, to comply with FAT specification, will have to allocate all data blocks and zero-initialize them, even if you "skip" the writes. That's because of the "cluster chaining" FAT does.

    The reason for that behaviour is FAT's inability to support either:

    • UN*X-style "holes" in files (aka "sparse files")
      that's what you're creating on ext3 with your testcase - a file with no data blocks allocated to the first 1GB-1MB of it, and a single 1MB chunk of actually committed, zero-initialized blocks) at the end.
    • NTFS-style "valid data length" information.
      On NTFS, a file can have uninitialized blocks allocated to it, but the file's metadata will keep two size fields - one for the total size of the file, another for the number of bytes actually written to it (from the beginning of the file).

    Without a specification supporting either technique, the filesystem would always have to allocate and zerofill all "intermediate" data blocks if you skip a range.

    Also remember that on ext3, the technique you used does not actually allocate blocks to the file (apart from the last 1MB). If you require the blocks preallocated (not just the size of the file set large), you'll have to perform a full write there as well.

    How could the VFAT driver be modified to deal with this ?

    At the moment, the driver uses the Linux kernel function cont_write_begin() to start even an asynchronous write to a file; this function looks like:

    /*
     * For moronic filesystems that do not allow holes in file.
     * We may have to extend the file.
     */
    int cont_write_begin(struct file *file, struct address_space *mapping,
                        loff_t pos, unsigned len, unsigned flags,
                        struct page **pagep, void **fsdata,
                        get_block_t *get_block, loff_t *bytes)
    {
        struct inode *inode = mapping->host;
        unsigned blocksize = 1 << inode->i_blkbits;
        unsigned zerofrom;
        int err;
    
        err = cont_expand_zero(file, mapping, pos, bytes);
        if (err)
                return err;
    
        zerofrom = *bytes & ~PAGE_CACHE_MASK;
        if (pos+len > *bytes && zerofrom & (blocksize-1)) {
                *bytes |= (blocksize-1);
                (*bytes)++;
        }
    
        return block_write_begin(mapping, pos, len, flags, pagep, get_block);
    }
    

    That is a simple strategy but also a pagecache trasher (your log messages are a consequence of the call to cont_expand_zero() which does all the work, and is not asynchronous). If the filesystem were to split the two operations - one task to do the "real" write, and another one to do the zero filling, it'd appear snappier.

    The way this could be achieved while still using the default linux filesystem utility interfaces were by internally creating two "virtual" files - one for the to-be-zerofilled area, and another for the actually-to-be-written data. The real file's directory entry and FAT cluster chain would only be updated once the background task is actually complete, by linking its last cluster with the first one of the "zerofill file" and the last cluster of that one with the first one of the "actual write file". One would also want to go for a directio write to do the zerofilling, in order to avoid trashing the pagecache.

    Note: While all this is technically possible for sure, the question is how worthwhile would it be to do such a change ? Who needs this operation all the time ? What would side effects be ?
    The existing (simple) code is perfectly acceptable for smaller skipping writes, you won't really notice its presence if you create a 1MB file and write a single byte at the end. It'll bite you only if you go for filesizes on the order of the limits of what the FAT filesystem allows you to do.

    Other options ...

    In some situations, the task at hand involves two (or more) steps:

    1. freshly format (e.g.) a SD card with FAT
    2. put one or more big files onto it to "pre-fill" the card
    3. (app-dependent, optional)
      pre-populate the files, or
      put a loopback filesystem image into them

    One of the cases I've worked on we've folded the first two - i.e. modified mkdosfs to pre-allocate/ pre-create files when making the (FAT32) filesystem. That's pretty simple, when writing the FAT tables just create allocated cluster chains instead of clusters filled with the "free" marker. It's also got the advantage that the data blocks are guaranteed to be contiguous, in case your app benefits from this. And you can decide to make mkdosfs not clear the previous contents of the data blocks. If you know, for example, that one of your preparation steps involves writing the entire data anyway or doing ext3-in-file-on-FAT (pretty common thing - linux appliance, sd card for data exchange with windows app/gui), then there's no need to zero out anything / double-write (once with zeroes, once with whatever-else). If your usecase fits this (i.e. formatting the card is a useful / normal step of the "initialize it for use" process anyway) then try it out; a suitably-modified mkdosfs is part of TomTom's dosfsutils sources, see mkdosfs.c search for the -N command line option handling.

    When talking about preallocation, as mentioned, there's also posix_fallocate(). Currently on Linux when using FAT, this will do essentially the same as a manual dd ..., i.e. wait for the zerofill. But the specification of the function doesn't mandate it being synchronous. The block allocation (FAT cluster chain generation) would have to be done synchronously, but the VFAT on-disk dirent size update and the data block zerofills could be backgrounded / delayed (i.e. either done at low-prio in background or only done if explicitly requested via fdsync() / sync() so that the app can e.g. alloc blocks, write the contents with non-zeroes itself ...). That's technique / design though; I'm not aware of anyone having done that kernel modification yet, if only for experimenting.