You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kristoffer Sjögren <st...@gmail.com> on 2016/01/14 14:04:00 UTC

Spark and HBase RDD join/get

Hi

We have a RDD<UserId> that needs to be mapped with information from
HBase, where the exact key is the user id.

What's the different alternatives for doing this?

- Is it possible to do HBase.get() requests from a map function in Spark?
- Or should we join RDDs with all full HBase table scan?

I ask because full table scans feels inefficient, especially if the
input RDD<UserId> is really small compared to the full table. But I
realize that a full table scan may not be what happens in reality?

Cheers,
-Kristoffer

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark and HBase RDD join/get

Posted by Kristoffer Sjögren <st...@gmail.com>.
Thanks Ted!

On Thu, Jan 14, 2016 at 4:49 PM, Ted Yu <yu...@gmail.com> wrote:
> For #1, yes it is possible.
>
> You can find some example in hbase-spark module of hbase where hbase as
> DataSource is provided.
> e.g.
>
> https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala
>
> Cheers
>
> On Thu, Jan 14, 2016 at 5:04 AM, Kristoffer Sjögren <st...@gmail.com>
> wrote:
>>
>> Hi
>>
>> We have a RDD<UserId> that needs to be mapped with information from
>> HBase, where the exact key is the user id.
>>
>> What's the different alternatives for doing this?
>>
>> - Is it possible to do HBase.get() requests from a map function in Spark?
>> - Or should we join RDDs with all full HBase table scan?
>>
>> I ask because full table scans feels inefficient, especially if the
>> input RDD<UserId> is really small compared to the full table. But I
>> realize that a full table scan may not be what happens in reality?
>>
>> Cheers,
>> -Kristoffer
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark and HBase RDD join/get

Posted by Ted Yu <yu...@gmail.com>.
For #1, yes it is possible.

You can find some example in hbase-spark module of hbase where hbase as
DataSource is provided.
e.g.

https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala

Cheers

On Thu, Jan 14, 2016 at 5:04 AM, Kristoffer Sjögren <st...@gmail.com>
wrote:

> Hi
>
> We have a RDD<UserId> that needs to be mapped with information from
> HBase, where the exact key is the user id.
>
> What's the different alternatives for doing this?
>
> - Is it possible to do HBase.get() requests from a map function in Spark?
> - Or should we join RDDs with all full HBase table scan?
>
> I ask because full table scans feels inefficient, especially if the
> input RDD<UserId> is really small compared to the full table. But I
> realize that a full table scan may not be what happens in reality?
>
> Cheers,
> -Kristoffer
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>