如何过滤Spark Hbase Rdd并获取结果？

更新时间：2022-10-21 23:26:52

 val sc = new SparkContext（sparkConf）
 val conf = HBaseConfiguration.create（）
 conf.set（TableInputFormat.INPUT_TABLE，tbl_date）
 val hBaseRDD = sc.newAPIHadoopRDD（conf，classOf [TableInputFormat]，
 classOf [ImmutableBytesWritable]，
 classOf [Result]）

添加以下内容：

  val localData = hbaseRDD.collect（）// This是结果数组
 val filteredData = localData.map {result => 
 result.getColumnCells（MyColFamily，MyColName）。get（0）//假设你想要第一个单元格：否则
 //你也可以把它们全部放在一起.. 
} .filter {cell =>新字符串（cell.getValueArray（））。startswtih（SomePrefix）}

/ dummy函数：

get（0）您需要决定是否只需要第一个单元格或所有单元格

new String（cell.getValueArray（））您需要转换为正确的数据类型
.startsWith（..）您需要决定如何处理数据

但是在任何情况下，上述都会给出如何处理hbase单元格数据的流程和大纲。

I am getting Rdd using spark and hbase. Now i want to filter that rdd and get a specific value from that Rdd. How can i proceed with?

Here is what i have done up to now

val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])

Now i want to use that RDD(hBaseRDD) and get a specific column data by sending a specific parameter to the RDD. How can i achieve this?

What you already have:

val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])

Add the following:

val localData = hbaseRDD.collect()  // This is array of Result
val filteredData = localData.map{ result =>
               result.getColumnCells("MyColFamily", "MyColName").get(0) // assuming you want first cell: otherwise
                                                       // you could also take all of them..
             }.filter{ cell => new String(cell.getValueArray()).startswtih("SomePrefix") }

The above shows placeholder/dummy functions for :

get(0) You need to decide if you want just first cell or all cells
new String(cell.getValueArray()) You need to convert to proper data type
.startsWith(..) You need to decide what to do with the data

But in any case the above gives you the flow and outline of how to process the hbase cell data.

上一篇 : ：javascript - 用css3如何实现如下效果下一篇 : 在Python中解析JSON字符串/对象

如何过滤Spark Hbase Rdd并获取结果？

相关阅读

技术问答最新文章