python spark 求解最大最小平均

更新时间：2022-09-14 10:30:18

rdd = sc.parallelizeDoubles(testData);

Now we’ll calculate the mean of our dataset.

1	LOGGER.info("Mean: " + rdd.mean());

There are similar methods for other statistics operation such as max, standard deviation, …etc.

Every time one of this method is invoked , Spark performs the operation on the entire RDD data. If more than one operations performed, it will repeat again and again which is very inefficient. To solve this, Spark provides “StatCounter” class which executes once and provides results of all basic statistics operations in the same time.

1	StatCounter statCounter = rdd.stats();

Now results can be accessed as follows,

LOGGER.info("Count: " + statCounter.count());

LOGGER.info("Min: " + statCounter.min());

LOGGER.info("Max: " + statCounter.max());

LOGGER.info("Sum: " + statCounter.sum());

LOGGER.info("Mean: " + statCounter.mean());

LOGGER.info("Variance: " + statCounter.variance());

LOGGER.info("Stdev: " + statCounter.stdev());

摘自：http://www.sparkexpert.com/tag/rdd/

本文转自张昺华-sky博客园博客，原文链接：http://www.cnblogs.com/bonelee/p/7154042.html，如需转载请自行联系原作者

上一篇 : ：6岁微软小冰“出嫁”，会写诗、唱歌、聊天的“智能少女”终于长大了！下一篇 : js响应鼠标滚轮事件

python spark 求解最大最小平均

相关阅读

推荐文章

python spark 求解最大 最小 平均

相关阅读

推荐文章

python spark 求解最大最小平均