且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

截断前面的文件

更新时间:2023-02-26 10:28:17

截断前面的文件似乎不难在系统级实现。



但有问题。


  • 第一个是在编程级别。当以随机访问方式打开文件时,当前的范例是使用文件开始处的偏移来指出文件中的不同位置。如果我们在文件的开始处截断(或者从文件的中间插入或者移除),那么它就不是一个稳定的属性。 (虽然appendind或从最后截断不是一个问题)。


    换句话说,截断开始将改变唯一的参考点和这是不好的。


    • 在系统级别使用存在,正如您所指出的那样,但是非常罕见。我相信大多数文件的使用是一次写入多次,所以即使截断不是一个关键的功能,我们也许可以没有它(有些事情会变得更困难,但没有什么是不可能的)。



    • 我们需要更复杂的访问(确实需要)我们以随机模式打开文件并添加一些内部结构信息。这些信息也可以在几个文件之间共享。这导致我们看到的最后一个问题,可能是最重要的。


      • 从某种意义上讲,当我们使用具有内部结构的随机访问文件时...我们仍然使用文件,但是我们不再使用文件范例。典型的这种情况是我们想要执行插入或删除记录的数据库,而不关心它们的物理位置。数据库可以使用文件作为低级别的实现,但出于优化目的,一些数据库编辑选择完全绕过文件系统(考虑Oracle分区)。



      我没有看到技术上的原因,为什么我们不能做所有的事情,目前在操作系统中使用数据库作为数据存储层。我甚至听说NTFS与内部数据库有许多共同点。一个操作系统可以(也可能会有一些不是那么远的功能)使用另一个范例而不是文件。总而言之,我认为这并不是一个技术上的问题,只是一个范式的改变,而删除开始绝对不是现在的文件范式,但不是一个大而有用的改变,迫使改变任何东西。


      A problem I was working on recently got me to wishing that I could lop off the front of a file. Kind of like a "truncate at front," if you will. Truncating a file at the back end is a common operation–something we do without even thinking much about it. But lopping off the front of a file? Sounds ridiculous at first, but only because we’ve been trained to think that it’s impossible. But a lop operation could be useful in some situations.

      A simple example (certainly not the only or necessarily the best example) is a FIFO queue. You’re adding new items to the end of the file and pulling items out of the file from the front. The file grows over time and there’s a huge empty space at the front. With current file systems, there are several ways around this problem:

      • As each item is removed, copy the remaining items up to replace it, and truncate the file. Although it works, this solution is very expensive time-wise.
      • Monitor the size of the empty space at the front, and when it reaches a particular size or percentage of the entire file size, move everything up and truncate the file. This is much more efficient than the previous solution, but still costs time when items are moved in the file.
      • Implement a circular queue in the file, adding new items to the hole at the front of the file as items are removed. This can be quite efficient, especially if you don’t mind the possibility of things getting out of order in the queue. If you do care about order, there’s the potential of having to move items around. But in general, a circular queue is pretty easy to implement and manages disk space well.

      But if there was a lop operation, removing an item from the queue would be as easy as updating the beginning-of-file marker. As easy, in fact, as truncating a file. Why, then, is there no such operation?

      I understand a bit about file systems implementation, and don't see any particular reason this would be difficult. It looks to me like all it would require is another word (dword, perhaps?) per allocation entry to say where the file starts within the block. With 1 terabyte drives under $100 US, it seems like a pretty small price to pay for such functionality.

      What other tasks would be made easier if you could lop off the front of a file as efficiently as you can truncate at the end?

      Can you think of any technical reason this function couldn't be added to a modern file system? Other, non-technical reasons?

      Truncate files at front seems not to hard to implement at system level.

      But there is issues.

      • The first one is at programming level. When opening file in random access the current paradigm is to use offset from the beginning of the file to point out different places in the file. If we truncate at beginning of file (or perform insertion or removal from the middle of the file) that is not any more a stable property. (While appendind or truncating from the end is not a problem).

      In other words truncating the beginning would change the only reference point and that is bad.

      • At a system level uses exists as you pointed out, but are quite rare. I believe most use of files is of the write once read many kind, so even truncate is not a critical feature and we could probably do without it (well some things would become more difficult, but nothing would become impossible).

      We we want more complex accesses (and there is indeed need) we open files in random mode and add some internal structure information. This information can also be shared between several files. This lead us to the last issue I see, probably the most important.

      • In a sense when we use random access files with some internal structure... we still use files but we are not any more using files paradigm. The typical such case is the database where we want to perform insertion or removal of records without caring at all about their physical place. Databases can use files as low level implementation but for optimisation purpose some database editors choose to completely bypass filesystem (think about Oracle partitions).

      I see no technical reason why we couldn't do everything is currently done in an operating system with files using a database as data storage layer. I even heard that NTFS has many common points with databases in it's internals. An operating system can (and probably will in some not so far feature) use another paradigm than files one.

      Summarily i believe that's no technical problemat all, just a change of paradigm and that removing the beginning is definitely not in the current file paradigm, but not a big and useful enough change to compel changing anything at all.