且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Spark中过滤和计数单词时出错

更新时间:2023-11-28 18:56:28

您的代码中有太多错误. 数组创建部分似乎在 pyspark 中,但是其余代码看起来在 scala 中.而且没有用于 sparkContext 实例的 text_file api .

There are so many errors in your code. The array creation part seems to be in pyspark but the rest of the codes look to be in scala. And there is no text_file api for sparkContext instance.

pyspark 的解决方案:

solution for pyspark :

words = ['dog', 'cat','tiger','lion','cheetah']

filePath = sc.textFile("/user/cloudera/input/Hin*/datafile.txt")
from operator import add
crimecounts = filePath.flatMap(lambda line: line.split(" ")).filter(lambda w: w.lower() in words).map(lambda word: (word, 1)).reduceByKey(add)

scala 的解决方案:

solution for scala:

val words = Array("dog","cat","tiger","lion","cheetah")

val filePath = sc.textFile("/user/cloudera/input/Hin*/datafile.txt")
val crimecounts = filePath.flatMap(line => line.split(" ")).filter(w => words.contains(w.toLowerCase)).map(word => (word, 1)).reduceByKey(_ + _)