且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

混淆了Hadoop namenode内存使用情况

更新时间:2023-11-17 09:24:34

我猜你读了汤姆怀特的书的第二版。我有第三版,并且对这篇文章的引用 Hadoop分布式文件系统的可扩展性。进入后,我读了下一句:


估计表明名称节点使用少于200个字节来存储单个元数据对象(文件inode或块)

HDFS NameNode中的文件是:文件inode +一个块。每个引用都有150个字节。 1.000.000个文件= 1.000.000个inodes + 1.000.000个块参考(在本例中,每个文件占用1个块)。


我把链接提供给你,可以验证我是否提交我的论证中有一个错误。


I have a silly doubt on Hadoop namenode memory calculation.It is mentioned in Hadoop book (Definite guide) as

"Since the namenode holds filesystem metadata in memory, the limit to the number of files in a filesystem is governed by the amount of memory on the namenode. As a rule of thumb, each file, directory, and block takes about 150 bytes. So, for example, if you had one million files, each taking one block, you would need at least 300 MB of memory. While storing millions of files is feasible, billions is beyond the capability of current hardware."

Since each taking one block, namenode minimum memory should be 150MB and not 300MB.Please help me to understand why it is 300MB

I guess you read the second edition of Tom White's book. I have the third edition, and this reference to a post Scalability of the Hadoop Distributed File System. Into the post, I read the next sentence:

Estimates show that the name-node uses less than 200 bytes to store a single metadata object (a file inode or a block).

A file in HDFS NameNode is: A file inode + a block. Each reference to both have 150 bytes. 1.000.000 of files = 1.000.000 inodes + 1.000.000 block reference (In the example, each file occupied 1 block).

2.000.000 * 150 bytes ~= 300Mb

I put the link for you can verify if I commit a mistake in my argumentation.