在Spark中过滤和计数单词时出错

更新时间：2023-11-28 18:56:28

您的代码中有太多错误. 数组创建部分似乎在 pyspark 中，但是其余代码看起来在 scala 中.而且没有用于 sparkContext 实例的 text_file api .

There are so many errors in your code. The array creation part seems to be in pyspark but the rest of the codes look to be in scala. And there is no text_file api for sparkContext instance.

pyspark 的解决方案:

solution for pyspark :

words = ['dog', 'cat','tiger','lion','cheetah']

filePath = sc.textFile("/user/cloudera/input/Hin*/datafile.txt")
from operator import add
crimecounts = filePath.flatMap(lambda line: line.split(" ")).filter(lambda w: w.lower() in words).map(lambda word: (word, 1)).reduceByKey(add)

scala 的解决方案:

solution for scala:

val words = Array("dog","cat","tiger","lion","cheetah")

val filePath = sc.textFile("/user/cloudera/input/Hin*/datafile.txt")
val crimecounts = filePath.flatMap(line => line.split(" ")).filter(w => words.contains(w.toLowerCase)).map(word => (word, 1)).reduceByKey(_ + _)

上一篇 : ：如何确定Dart列表是否为固定列表？下一篇 : 通过检查字符串是否出现在列中来过滤PySpark DataFrame

在Spark中过滤和计数单词时出错

相关阅读

推荐文章