更新时间:2022-10-21 23:26:52
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE,tbl_date)
val hBaseRDD = sc.newAPIHadoopRDD(conf,classOf [TableInputFormat],
classOf [ImmutableBytesWritable],
classOf [Result])
添加以下内容:
val localData = hbaseRDD.collect()// This是结果数组
val filteredData = localData.map {result =>
result.getColumnCells(MyColFamily,MyColName)。get(0)//假设你想要第一个单元格:否则
//你也可以把它们全部放在一起..
} .filter {cell =>新字符串(cell.getValueArray())。startswtih(SomePrefix)}
/ dummy函数:
但是在任何情况下,上述都会给出如何处理hbase单元格数据的流程和大纲。
I am getting Rdd using spark and hbase. Now i want to filter that rdd and get a specific value from that Rdd. How can i proceed with?
Here is what i have done up to now
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])
Now i want to use that RDD(hBaseRDD) and get a specific column data by sending a specific parameter to the RDD. How can i achieve this?
What you already have:
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "tbl_date")
val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable],
classOf[Result])
Add the following:
val localData = hbaseRDD.collect() // This is array of Result
val filteredData = localData.map{ result =>
result.getColumnCells("MyColFamily", "MyColName").get(0) // assuming you want first cell: otherwise
// you could also take all of them..
}.filter{ cell => new String(cell.getValueArray()).startswtih("SomePrefix") }
The above shows placeholder/dummy functions for :
But in any case the above gives you the flow and outline of how to process the hbase cell data.