且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在spark中的saveAsTextFile时如何命名文件?

更新时间:2022-03-25 07:41:57

正如我在上面的评论中所说,带有示例的文档可以在

As I said in my comment above, the documentation with examples can be found here. And quoting the description of the method saveAsTextFile:

使用元素的字符串表示形式将此RDD保存为文本文件.

Save this RDD as a text file, using string representations of elements.

在下面的示例中,我将一个简单的RDD保存到文件中,然后加载并打印其内容.

In the following example I save a simple RDD into a file, then I load it and print its content.

samples = sc.parallelize([
    ("abonsanto@fakemail.com", "Alberto", "Bonsanto"),
    ("mbonsanto@fakemail.com", "Miguel", "Bonsanto"),
    ("stranger@fakemail.com", "Stranger", "Weirdo"),
    ("dbonsanto@fakemail.com", "Dakota", "Bonsanto")
])

print samples.collect()

samples.saveAsTextFile("folder/here.txt")
read_rdd = sc.textFile("folder/here.txt")

read_rdd.collect()

输出将是

('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
('stranger@fakemail.com', 'Stranger', 'Weirdo')
('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')

[u"('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')",
 u"('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')",
 u"('stranger@fakemail.com', 'Stranger', 'Weirdo')",
 u"('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')"]

让我们使用基于Unix的终端看看.

Let's take a look using a Unix-based terminal.

usr@host:~/folder/here.txt$ cat *
('abonsanto@fakemail.com', 'Alberto', 'Bonsanto')
('mbonsanto@fakemail.com', 'Miguel', 'Bonsanto')
('stranger@fakemail.com', 'Stranger', 'Weirdo')
('dbonsanto@fakemail.com', 'Dakota', 'Bonsanto')