且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用文件系统(而不是数据库!)无模式数据 - ***实践

更新时间:2023-01-22 12:23:40

是的文件系统可以作为一种特殊情况的NOSQL-像数据库系统。它可能有一些限制,应该在任何设计决策时考虑:



优点:$ b​​ $ b -
- 简单,直观。 b
$ b


  • 利用多年的调整和缓存算法

  • 轻松备份,可能容易聚类



要考虑的事项:




  • 您可以有
    层级或多值属性


  • 它是什么类型的
    数据存储,

    查询元数据的速度 - 并不是所有的
    fs都是非常优化的
    ,不包括大小,日期。


  • (尽管
    是NoSQL非常常见的)


  • 低效的存储使用(除非文件
    系统执行块子分配,
    ,通常每个项目会消耗4-16K,不论大小是多少,都会存储


  • 可能没有缓存算法

  • 备份解决方案可能会遇到问题
    取决于您的存储方式事物 -
    太深,每个节点太多的项目,
    等 - 这可能会消除这样的结构的明显的
    优点。
    锁定LOCAL文件系统的工作原理
    当然如果你调用
    正确的例程,但不一定
    为网络基本文件系统(那些
    问题已经解决了各种
    的方式,但它当然是一个设计
    问题)


After reading over my other question, Using a Relational Database for Schema-Less Data, I began to wonder if a filesystem is more appropriate than a relational database for storing and querying schemaless data.

Rather than just building a file system on top of MySQL, why not just save the data directly to the filesystem? Indexing needs to be figured out, but modern filesystems are very stable, have great features like replication, snapshot and backup facilities, and are flexible at storing schema-less data.

However, I can't find any examples of someone using a filesystem instead of a database.

Where can I find more resources on how to implement a schemaless (or "document-oriented") database as a layer on top of a filesystem? Is anyone using a modern filesystem as a schemaless database?

Yes a filesystem could be taken as a special case of a NOSQL-like database system. It may have some limitations that should be considered during any design decisions:

pros: - - simple, intuitive.

  • takes advantage of years of tuning and caching algorithms
  • easy backup, potentially easy clustering

things to think about:

  • richness of metadata - what types of data does it store, how does it let you query them, can you have hierarchal or multivalued attributes

  • speed of querying metadata - not all fs's are particularly well optimized with anything other than size, dates.

  • inability to join queries (though that's pretty much common to NoSQL)

  • inefficient storage usage (unless the file system performs block suballocation, you'll typically blow 4-16K per item stored regardless of size)

  • May not have the kind of caching algorithm you want for it's directory structure
  • tends to be less tunable, etc.
  • backup solutions may have trouble depending on how you store things - too deep, too many items per node, etc - which might obviate an obvious advantage of such a structure. locking for a LOCAL filesystem works pretty well of course if you call the right routines, but not necessarily for a network base fileesytem (those problems have been solved in various ways, but it's certainly a design issue)