且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

数据结构的优化存储,快速查找和持久性

更新时间:2023-01-15 09:20:31

在很多测试中,我结束了使用内存映射文件,用疏位(NTFS)标记他们,用code从的 NTFS稀疏文件用C#

After much testing I ended up using Memory Mapped Files, marking them with the sparse bit (NTFS), using code from NTFS Sparse Files with C#.

***有一个什么样的稀疏文件是一个解释。

Wikipedia has an explanation of what a sparse file is.

使用稀疏文件的好处是,我不必在乎范围我的ID是。如果我只写ID的20.06亿和2010999999之间,该文件将只在文件中分配625000个字节偏移250750000 。最大空间的偏移量是未分配的文件系统。每个ID存储在文件中设置的位。排序的视为位阵列。而如果ID序列突然变化,那么它将会分配到文件的另一部分。

The benefits of using a sparse file is that I don't have to care about what range my id's are in. If I only write id's between 2006000000 and 2010999999, the file will only allocate 625,000 bytes from offset 250,750,000 in the file. All space up to that offset is unallocated in the file system. Each id is stored as a set bit in the file. Sort of treated as an bit array. And if the id sequence suddenly changes, then it will allocate in another part of the file.

为了获取其ID的设置,我可以执行操作系统调用来获取稀疏文件分配的部分,然后我检查这些序列的每一位。还检查是否一个特定的ID被设置是非常快的。如果它落在分配的块外,那么它不存在,如果它落入,它只是一个字节读和位屏蔽检查是否正确的位置。

In order to retrieve which id's are set, I can perform a OS call to get the allocated parts of the sparse file, and then I check each bit in those sequences. Also checking if a particular id is set is very fast. If it falls outside the allocated blocks, then it's not there, if it falls within, it's merely one byte read and a bit mask check to see if the correct bit is set.

因此​​,对,你必须要检查尽可能多的速度尽可能多的ID特定的场景中,这是我迄今为止发现的最优化的方式。

So for the particular scenario where you have many id's which you want to check on with as much speed as possible, this is the most optimal way I've found so far.

和良好的部分是内存映射文件可以使用Java共享,以及(这被证明是需要的东西)。 Java还具有内存支持映射到Windows文件,贯彻读/写逻辑非常简单。

And the good part is that the memory mapped files can be shared with Java as well (which turned out to be something needed). Java also has support for memory mapped files on Windows, and implementing the read/write logic is fairly trivial.