You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/12/29 12:10:02 UTC

Reading specific column family and columns in Hbase table through spark

Hi,

I have a routine in Spark that iterates  through Hbase rows and tries to
read columns.

My question is how can I read the correct ordering of columns?

example

val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
      classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
      classOf[org.apache.hadoop.hbase.client.Result])

val parsed = hBaseRDD.map{ case(b, a) => val iter = a.list().iterator();
            ( Bytes.toString(a.getRow()).toString,
            Bytes.toString( iter.next().getValue()).toString,
            Bytes.toString( iter.next().getValue()).toString,
            Bytes.toString( iter.next().getValue()).toString,
            Bytes.toString(iter.next().getValue())
)}

The above reads the column family columns sequentially. How can I force it
to read specific columns only?


Thanks


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Re: Reading specific column family and columns in Hbase table through spark

Posted by Nkechi Achara <nk...@googlemail.com>.
Hey Mich,

Are you setting the column family / qualifier values in the config?

e.g.

config.set(TableInputFormat.SCAN_COLUMN_FAMILY, "cf") // column family
config.set(TableInputFormat.SCAN_COLUMNS, "cf1:cq1 cf1:cq2") // column
qualifier

As you already have the results when you use newAPIHadoopRDD then you can
cast it to a conversion function too, like:

val r: Result
r.getValue(<Column Family as Bytes>, <column Qualifier as Bytes>) this will
either retrieve the value in Bytes or null if it does not exist.


Thanks,

K

On 29 December 2016 at 13:10, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi,
>
> I have a routine in Spark that iterates  through Hbase rows and tries to
> read columns.
>
> My question is how can I read the correct ordering of columns?
>
> example
>
> val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat],
>       classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],
>       classOf[org.apache.hadoop.hbase.client.Result])
>
> val parsed = hBaseRDD.map{ case(b, a) => val iter = a.list().iterator();
>             ( Bytes.toString(a.getRow()).toString,
>             Bytes.toString( iter.next().getValue()).toString,
>             Bytes.toString( iter.next().getValue()).toString,
>             Bytes.toString( iter.next().getValue()).toString,
>             Bytes.toString(iter.next().getValue())
> )}
>
> The above reads the column family columns sequentially. How can I force it
> to read specific columns only?
>
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>