且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

用户是否需要在所有节点上都存在才能被hadoop集群/HDFS识别?

更新时间:2023-01-15 21:57:21

在Hortonworks上找到了此答案

Found this answer on the Hortonworks community site:

用户不应在群集的所有节点上都具有帐户.他应该只在边缘节点上有帐户.

User should not have account on all the nodes of the cluster. He should only have account on edge node.

对于新用户,有两种类型的目录,我们需要在用户访问集群之前创建目录.

For a new user there are 2 types are directories we need to create before the user access the cluster.

1-用户主目录[在Linux文件系统上创建的目录,即./home/]

1- User home directory [directory created on Linux Filesystem ie. /home/]

2-用户HDFS目录[在HDFS文件系统上创建的目录,即./user/]

2- User HDFS directory [directory created on HDFS filesystem ie. /user/]

... ,您只需要创建HDFS主目录即可./user/] [由于HDFS似乎与任何特定边缘节点没有任何关系,因此不确定此处的含义]. 即使您尚未在Linux上创建他的主目录,您仍然可以在群集上与新用户一起运行作业.

...you only need to create HDFS home directory[ie. /user/] on edge node [not sure the meaning here since HDFS does not seem to have anything to do with any particular edge node]. You can still run jobs with the new user on cluster, even if you haven't created his home directory in linux.

** 更新: 根据用户@ cricket_007的评论,看来该用户也必须同时存在于名称节点服务器上.我能找到的最接近文档的文档明确指出了此

** Update: Based on comments by user @cricket_007, it appears that the user must also exist on the namenode server as well. The closest I could find to docs explicitly stating this says:

每个文件或目录操作将完整路径名传递给NameNode,并且权限检查沿每个操作的路径应用.客户端框架将隐式地将用户身份与与NameNode的连接关联起来,从而减少了对现有客户端API进行更改的需求. [...]例如,当客户端第一次开始读取文件时,它会向NameNode发出第一个请求,以发现文件的第一个块的位置.

Each file or directory operation passes the full path name to the NameNode, and the permissions checks are applied along the path for each operation. The client framework will implicitly associate the user identity with the connection to the NameNode, reducing the need for changes to the existing client API. [...] For instance, when the client first begins reading a file, it makes a first request to the NameNode to discover the location of the first blocks of the file.