且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 Hadoop 流中链接多个 mapreduce 任务

更新时间:2023-11-17 10:16:46

这里有一篇很棒的博客文章,介绍了如何使用级联和流式传输.http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/

Here is a great blog post on how to use Cascading and Streaming. http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/

这里的价值是您可以在同一个应用程序中将 Java(级联查询流)与您的自定义流操作混合使用.我发现这比其他方法更不脆弱.

The value here is you can mix java (Cascading query flows) with your custom streaming operations in the same app. I find this much less brittle than other methods.

请注意,Cascading 中的 Cascade 对象允许您链接多个 Flow(通过上述博客文章,您的 Streaming 作业将成为 MapReduceFlow).

Note, the Cascade object in Cascading allows you to chain multiple Flows (via the above blog post your Streaming job would become a MapReduceFlow).

免责声明:我是 Cascading 的作者

Disclaimer: I'm the author of Cascading