且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

python spark 求解最大 最小 平均

更新时间:2022-09-14 10:30:18

rdd = sc.parallelizeDoubles(testData);

Now we’ll calculate the mean of our dataset.

There are similar methods for other statistics operation such as max, standard deviation, …etc.

Every time one of this method is invoked , Spark performs the operation on the entire RDD data. If more than one operations performed, it will repeat again and again which is very inefficient. To solve this, Spark provides “StatCounter” class which executes once and provides results of all basic statistics operations in the same time.

Now results can be accessed as follows,