且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

MapReduce作业输出排序顺序

更新时间:2023-11-17 11:18:04

You can achieve a globally sorted file (which is what you basically want) using these methods:

  1. Use just one reducer in mapreduce (bad idea !! This puts too much work on one machine)
  2. Write a custom partitioner. Partioner is the class which divides the key space in mapreduce. The default partioner (Hashpartioner) evenly divides the key space into the number of reducers. Check out this example for writing a custom partioner.

  3. Use Hadoop Pig/Hive to do sort.