且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何过滤Spark Hbase Rdd并获取结果?

更新时间:2022-10-21 23:26:52



 val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE,tbl_date)
val hBaseRDD = sc.newAPIHadoopRDD(conf,classOf [TableInputFormat],
classOf [ImmutableBytesWritable],
classOf [Result])

添加以下内容:

  val localData = hbaseRDD.collect()// This是结果数组
val filteredData = localData.map {result =>
result.getColumnCells(MyColFamily,MyColName)。get(0)//假设你想要第一个单元格:否则
//你也可以把它们全部放在一起..
} .filter {cell =>新字符串(cell.getValueArray())。startswtih(SomePrefix)}

/ dummy函数:


  • get(0)您需要决定是否只需要第一个单元格或所有单元格

  • new String(cell.getValueArray())您需要转换为正确的数据类型
  • .startsWith(..)您需要决定如何处理数据



但是在任何情况下,上述都会给出如何处理hbase单元格数据的流程和大纲。


I am getting Rdd using spark and hbase. Now i want to filter that rdd and get a specific value from that Rdd. How can i proceed with?

Here is what i have done up to now

val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])

Now i want to use that RDD(hBaseRDD) and get a specific column data by sending a specific parameter to the RDD. How can i achieve this?

What you already have:

val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])

Add the following:

val localData = hbaseRDD.collect()  // This is array of Result
val filteredData = localData.map{ result =>
               result.getColumnCells("MyColFamily", "MyColName").get(0) // assuming you want first cell: otherwise
                                                       // you could also take all of them..
             }.filter{ cell => new String(cell.getValueArray()).startswtih("SomePrefix") }

The above shows placeholder/dummy functions for :

  • get(0) You need to decide if you want just first cell or all cells
  • new String(cell.getValueArray()) You need to convert to proper data type
  • .startsWith(..) You need to decide what to do with the data

But in any case the above gives you the flow and outline of how to process the hbase cell data.