且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Hive数据的减速器

更新时间:2023-01-31 19:40:48

在开放源码配置单元(和可能的EMR)中,

$ #reducers =(映射器的输入字节数)
/(hive.exec.reducers.bytes.per.reducer)

默认的hive.exec.reducers.bytes.per.reducer是1G。



reducer还取决于输入文件的大小
您可以通过设置属性hive.exec.reducers.bytes.per.reducer来更改:



更改hive-site.xml



hive.exec.reducers.bytes.per.reducer 1000000

或使用set



hive -eset hive.exec.reducers.bytes.per.reducer = 100000

I'm a novice. I'm curious to know how reducers are set to different hive data sets. Is it based on the size of the data processed? Or a default set of reducers for all?

For example, 5GB of data requires how many reducers? will the same number of reducers set to smaller data set?

Thanks in advance!! Cheers!

In open source hive (and EMR likely)

# reducers = (# bytes of input to mappers)
             / (hive.exec.reducers.bytes.per.reducer)

default hive.exec.reducers.bytes.per.reducer is 1G.

Number of reducers depends also on size of the input file You could change that by setting the property hive.exec.reducers.bytes.per.reducer:

either by changing hive-site.xml

hive.exec.reducers.bytes.per.reducer 1000000

or using set

hive -e "set hive.exec.reducers.bytes.per.reducer=100000