You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Christopher Dorner <ch...@gmail.com> on 2011/10/16 11:48:21 UTC

Reduce-side-join, input from hbase and hdfs

Hi,

I am considering doing Reduce-Side-Joins, where one input would be read 
from HDFS and another one from a HBase Table.

is it somehow possible to use

TableMapReduceUtil.initTableMapperJob(table, scan, Mapper_HBase.class, 
..., job);

and

MultipleInputs(job, path, ..., Mapper_HDFS.class)

in the same time for one job?
It seems, MultipleInputs(...) gets the priority when i tried to use 
both. The Mapper_HBase was not executed. It executes, when i remove the 
MultipleInputs.


And is there something equivalent to MultipleInputs() for HBase Tables? 
e.g. MultipleTableInputs()? I saw there was a request here
https://issues.apache.org/jira/browse/HBASE-2965


A workaround would be to write the Scan Results to HDFS first and do the 
reduce-side join by using MultipleInputs. But i wanted to avoid this 
additional I/O overhead.

Thanks,
Christopher




Re: Reduce-side-join, input from hbase and hdfs

Posted by Jean-Daniel Cryans <jd...@apache.org>.
You cannot have 2 input formats, so at this point you need to write your own
input format that is both an input format for HDFS files and HBase.

Currently there's no MultipleTableInputFormat, although it wouldn't solve
your problem because it won't take HDFS inputs.

Your other option sounds right, although slower as you mentioned.

J-D

On Sun, Oct 16, 2011 at 2:48 AM, Christopher Dorner <
christopher.dorner@gmail.com> wrote:

> Hi,
>
> I am considering doing Reduce-Side-Joins, where one input would be read
> from HDFS and another one from a HBase Table.
>
> is it somehow possible to use
>
> TableMapReduceUtil.**initTableMapperJob(table, scan, Mapper_HBase.class,
> ..., job);
>
> and
>
> MultipleInputs(job, path, ..., Mapper_HDFS.class)
>
> in the same time for one job?
> It seems, MultipleInputs(...) gets the priority when i tried to use both.
> The Mapper_HBase was not executed. It executes, when i remove the
> MultipleInputs.
>
>
> And is there something equivalent to MultipleInputs() for HBase Tables?
> e.g. MultipleTableInputs()? I saw there was a request here
> https://issues.apache.org/**jira/browse/HBASE-2965<https://issues.apache.org/jira/browse/HBASE-2965>
>
>
> A workaround would be to write the Scan Results to HDFS first and do the
> reduce-side join by using MultipleInputs. But i wanted to avoid this
> additional I/O overhead.
>
> Thanks,
> Christopher
>
>
>
>