且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

MapReduce 作业输出排序顺序

更新时间:2023-11-17 11:34:58

您可以使用以下方法实现全局排序的文件(这正是您想要的):

You can achieve a globally sorted file (which is what you basically want) using these methods:

  1. 在 mapreduce 中只使用一个 reducer(坏主意!!这会在一台机器上做太多工作)
  2. 编写自定义分区程序.Partioner是mapreduce中划分key空间的类.默认分区器(Hashpartioner)将key空间平均划分为reducer的数量.查看示例以编写自定义分区程序.

  1. Use just one reducer in mapreduce (bad idea !! This puts too much work on one machine)
  2. Write a custom partitioner. Partioner is the class which divides the key space in mapreduce. The default partioner (Hashpartioner) evenly divides the key space into the number of reducers. Check out this example for writing a custom partioner.

使用 Hadoop Pig/Hive 进行排序.

Use Hadoop Pig/Hive to do sort.