You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Christopher Dorner <ch...@gmail.com> on 2011/10/16 11:48:21 UTC
Reduce-side-join, input from hbase and hdfs
Hi,
I am considering doing Reduce-Side-Joins, where one input would be read
from HDFS and another one from a HBase Table.
is it somehow possible to use
TableMapReduceUtil.initTableMapperJob(table, scan, Mapper_HBase.class,
..., job);
and
MultipleInputs(job, path, ..., Mapper_HDFS.class)
in the same time for one job?
It seems, MultipleInputs(...) gets the priority when i tried to use
both. The Mapper_HBase was not executed. It executes, when i remove the
MultipleInputs.
And is there something equivalent to MultipleInputs() for HBase Tables?
e.g. MultipleTableInputs()? I saw there was a request here
https://issues.apache.org/jira/browse/HBASE-2965
A workaround would be to write the Scan Results to HDFS first and do the
reduce-side join by using MultipleInputs. But i wanted to avoid this
additional I/O overhead.
Thanks,
Christopher
Re: Reduce-side-join, input from hbase and hdfs
Posted by Jean-Daniel Cryans <jd...@apache.org>.
You cannot have 2 input formats, so at this point you need to write your own
input format that is both an input format for HDFS files and HBase.
Currently there's no MultipleTableInputFormat, although it wouldn't solve
your problem because it won't take HDFS inputs.
Your other option sounds right, although slower as you mentioned.
J-D
On Sun, Oct 16, 2011 at 2:48 AM, Christopher Dorner <
christopher.dorner@gmail.com> wrote:
> Hi,
>
> I am considering doing Reduce-Side-Joins, where one input would be read
> from HDFS and another one from a HBase Table.
>
> is it somehow possible to use
>
> TableMapReduceUtil.**initTableMapperJob(table, scan, Mapper_HBase.class,
> ..., job);
>
> and
>
> MultipleInputs(job, path, ..., Mapper_HDFS.class)
>
> in the same time for one job?
> It seems, MultipleInputs(...) gets the priority when i tried to use both.
> The Mapper_HBase was not executed. It executes, when i remove the
> MultipleInputs.
>
>
> And is there something equivalent to MultipleInputs() for HBase Tables?
> e.g. MultipleTableInputs()? I saw there was a request here
> https://issues.apache.org/**jira/browse/HBASE-2965<https://issues.apache.org/jira/browse/HBASE-2965>
>
>
> A workaround would be to write the Scan Results to HDFS first and do the
> reduce-side join by using MultipleInputs. But i wanted to avoid this
> additional I/O overhead.
>
> Thanks,
> Christopher
>
>
>
>