You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Irfan Mohammed <ir...@gmail.com> on 2009/06/23 05:27:55 UTC

ChainMapper/ChainReducer for hbase Tables

Don't know if this is a M/R question or hbase question but need help in solving the following problem.

I have records with some dimensions and some metrics. In this case, lets say dimension DD and the metrics [ M1- M2 ].

DD - M1 - M2
------------
------------
D1 - N1 - N2
D2 - N1 - N2

For a given record, I want to store each of the metrics into its own hbase table.

T_M1
----
     DD:D1 DD:D2
R1 -  N1    N1

T_M2
----
     DD:D1 DD:D2
R1 -  N2    N2

Right now, I have a mapper which converts the plain record into <ImmutableBytesWritable, HbaseMapWritable<byte[], byte[]>> and the reducer puts into the Hbase table using the <ImmutableBytesWritable, Put>. But the process starts all over again for the same record for putting the data into the second metrics table. 

I want to be able to use ChainMapper/ChainReducer but was not successful. ChainReducer allows only one reducer [setReducer] and there can be mappers after the reducer. attached is the example code where in the test case, I am looping over the whole map/reduce for each table. 

Is there a way to be able to do the following?

1. map
2. reduce --> t_m1
3. reduce --> t_m2

Thanks,
Irfan

Re: ChainMapper/ChainReducer for hbase Tables

Posted by jason hadoop <ja...@gmail.com>.
Replace file with table in my prior message, and delete the multi output
format reference. I missed that this was hbase-dev

On Tue, Jun 23, 2009 at 6:27 AM, jason hadoop <ja...@gmail.com>wrote:

> Can you not just open two output files and output both data sets from one
> reduce,either via multi output format or by opening the second find directly
> in the configure method of your reduce and closing it in the close?
>
>
> On Mon, Jun 22, 2009 at 8:27 PM, Irfan Mohammed <ir...@gmail.com>wrote:
>
>> Don't know if this is a M/R question or hbase question but need help in
>> solving the following problem.
>>
>> I have records with some dimensions and some metrics. In this case, lets
>> say dimension DD and the metrics [ M1- M2 ].
>>
>> DD - M1 - M2
>> ------------
>> ------------
>> D1 - N1 - N2
>> D2 - N1 - N2
>>
>> For a given record, I want to store each of the metrics into its own hbase
>> table.
>>
>> T_M1
>> ----
>>     DD:D1 DD:D2
>> R1 -  N1    N1
>>
>> T_M2
>> ----
>>     DD:D1 DD:D2
>> R1 -  N2    N2
>>
>> Right now, I have a mapper which converts the plain record into
>> <ImmutableBytesWritable, HbaseMapWritable<byte[], byte[]>> and the reducer
>> puts into the Hbase table using the <ImmutableBytesWritable, Put>. But the
>> process starts all over again for the same record for putting the data into
>> the second metrics table.
>>
>> I want to be able to use ChainMapper/ChainReducer but was not successful.
>> ChainReducer allows only one reducer [setReducer] and there can be mappers
>> after the reducer. attached is the example code where in the test case, I am
>> looping over the whole map/reduce for each table.
>>
>> Is there a way to be able to do the following?
>>
>> 1. map
>> 2. reduce --> t_m1
>> 3. reduce --> t_m2
>>
>> Thanks,
>> Irfan
>
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: ChainMapper/ChainReducer for hbase Tables

Posted by jason hadoop <ja...@gmail.com>.
Can you not just open two output files and output both data sets from one
reduce,either via multi output format or by opening the second find directly
in the configure method of your reduce and closing it in the close?

On Mon, Jun 22, 2009 at 8:27 PM, Irfan Mohammed <ir...@gmail.com> wrote:

> Don't know if this is a M/R question or hbase question but need help in
> solving the following problem.
>
> I have records with some dimensions and some metrics. In this case, lets
> say dimension DD and the metrics [ M1- M2 ].
>
> DD - M1 - M2
> ------------
> ------------
> D1 - N1 - N2
> D2 - N1 - N2
>
> For a given record, I want to store each of the metrics into its own hbase
> table.
>
> T_M1
> ----
>     DD:D1 DD:D2
> R1 -  N1    N1
>
> T_M2
> ----
>     DD:D1 DD:D2
> R1 -  N2    N2
>
> Right now, I have a mapper which converts the plain record into
> <ImmutableBytesWritable, HbaseMapWritable<byte[], byte[]>> and the reducer
> puts into the Hbase table using the <ImmutableBytesWritable, Put>. But the
> process starts all over again for the same record for putting the data into
> the second metrics table.
>
> I want to be able to use ChainMapper/ChainReducer but was not successful.
> ChainReducer allows only one reducer [setReducer] and there can be mappers
> after the reducer. attached is the example code where in the test case, I am
> looping over the whole map/reduce for each table.
>
> Is there a way to be able to do the following?
>
> 1. map
> 2. reduce --> t_m1
> 3. reduce --> t_m2
>
> Thanks,
> Irfan




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals