You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Abir Chokraborty <ab...@gmail.com> on 2017/02/25 07:38:13 UTC

Reading data for a particular column-cell with 2 or more values of a same row-key

HBase table contains the following:

ROW              COLUMN+CELL
Product01        column=cf:ProductFeature,  timestamp=1487917201238,value=
Feature01
Product01        column=cf:ProductFeature,  timestamp=1487917201239,value=
Feature02
Product01        column=cf:ProductFeature,  timestamp=1487917201240,value=
Feature03
Product01        column=cf:Price,  timestamp=1487917201242,value=\x012A\xF8
Product01        column=cf:Location,  timestamp=1487917201244,value= Texas
Here VERSIONS is 3. So it is keeping 3 different values for ProductFeature
column. I wrote the following to create an RDD

val hbaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
      classOf[org.apache.hadoop.hbase.client.Result])
  val resultRDD = hbaseRDD.map(tuple => tuple._2)
  val testRDD = resultRDD.map(Row.parseRow)
  val testDF = testRDD.toDF()
Here, parseRow is a method that returns tuple of
(ROW,ProductFeature,Price,Location). I am only getting

+----------------+----------------+---------+---------+
|             Row|  ProductFeature|  Price| Location|
+----------------+----------------+---------+---------+
|       Product01|       Feature03|       65|    Texas| 
+----------------+----------------+---------+---------+
Where do I have to change in the code so that I can create DataFrame for
different values of ProductFeature like the following:

+----------------+----------------+---------+---------+
|                Row|  ProductFeature|       Price| Location|
+----------------+----------------+---------+---------+
|        Product01|       Feature01|          65|    Texas| 
+----------------+----------------+---------+---------+
|        Product01|       Feature02|          65|    Texas| 
+----------------+----------------+---------+---------+
|        Product01|       Feature03|          65|    Texas| 
+----------------+----------------+---------+---------+



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Reading-data-for-a-particular-column-cell-with-2-or-more-values-of-a-same-row-key-tp4086420.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Reading data for a particular column-cell with 2 or more values of a same row-key

Posted by Richard Startin <ri...@outlook.com>.
If you operate directly on a Result you only get the latest version of each cell. To get older versions of cells you have a few options:


1) Result::getFamilyMap, if you only want versioned cells from a single family - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getFamilyMap-byte:A-

Result (Apache HBase 2.0.0-SNAPSHOT API)<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getFamilyMap-byte:A->
hbase.apache.org
@InterfaceAudience.Public @InterfaceStability.Stable public class Result extends Object implements org.apache.hadoop.hbase.CellScannable, org.apache.hadoop.hbase ...



2) Result::getMap - If you need versioned cells from all families - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getMap--

Result (Apache HBase 2.0.0-SNAPSHOT API)<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getMap-->
hbase.apache.org
@InterfaceAudience.Public @InterfaceStability.Stable public class Result extends Object implements org.apache.hadoop.hbase.CellScannable, org.apache.hadoop.hbase ...



3) Get a cell scanner from Result::cellScanner - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#cellScanner--

Result (Apache HBase 2.0.0-SNAPSHOT API)<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#cellScanner-->
hbase.apache.org
@InterfaceAudience.Public @InterfaceStability.Stable public class Result extends Object implements org.apache.hadoop.hbase.CellScannable, org.apache.hadoop.hbase ...


So once you have your rows, add another mapping function using one of the methods above to get multi-version rows.



https://richardstartin.com/


________________________________
From: Abir Chokraborty <ab...@gmail.com>
Sent: 25 February 2017 07:38
To: user@hbase.apache.org
Subject: Reading data for a particular column-cell with 2 or more values of a same row-key

HBase table contains the following:

ROW              COLUMN+CELL
Product01        column=cf:ProductFeature,  timestamp=1487917201238,value=
Feature01
Product01        column=cf:ProductFeature,  timestamp=1487917201239,value=
Feature02
Product01        column=cf:ProductFeature,  timestamp=1487917201240,value=
Feature03
Product01        column=cf:Price,  timestamp=1487917201242,value=\x012A\xF8
Product01        column=cf:Location,  timestamp=1487917201244,value= Texas
Here VERSIONS is 3. So it is keeping 3 different values for ProductFeature
column. I wrote the following to create an RDD

val hbaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
      classOf[org.apache.hadoop.hbase.client.Result])
  val resultRDD = hbaseRDD.map(tuple => tuple._2)
  val testRDD = resultRDD.map(Row.parseRow)
  val testDF = testRDD.toDF()
Here, parseRow is a method that returns tuple of
(ROW,ProductFeature,Price,Location). I am only getting

+----------------+----------------+---------+---------+
|             Row|  ProductFeature|  Price| Location|
+----------------+----------------+---------+---------+
|       Product01|       Feature03|       65|    Texas|
+----------------+----------------+---------+---------+
Where do I have to change in the code so that I can create DataFrame for
different values of ProductFeature like the following:

+----------------+----------------+---------+---------+
|                Row|  ProductFeature|       Price| Location|
+----------------+----------------+---------+---------+
|        Product01|       Feature01|          65|    Texas|
+----------------+----------------+---------+---------+
|        Product01|       Feature02|          65|    Texas|
+----------------+----------------+---------+---------+
|        Product01|       Feature03|          65|    Texas|
+----------------+----------------+---------+---------+



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Reading-data-for-a-particular-column-cell-with-2-or-more-values-of-a-same-row-key-tp4086420.html
Sent from the HBase User mailing list archive at Nabble.com.