You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Weishung Chung <we...@gmail.com> on 2012/08/10 15:10:43 UTC

multitable query

Hi HBase users,

I need to pull data from 2 HBase tables in a mapreduce job. For 1 table
input, I use TableMapReduceUtil.initTableMapperJob. Is there another method
for multitable inputs ?

Thank you,
Wei Shung

Re: multitable query

Posted by Weishung Chung <we...@gmail.com>.
Yes...this looks like a good solution. But i am running chd3 and upgrade is
scheduled not until next year.

On Fri, Aug 10, 2012 at 7:20 AM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Wei:
>
> There is a jira Hbase-3996, does this sound something you are looking for?
>
> Regards,
>
> Jerry
>
> On Friday, August 10, 2012, Bryan Beaudreault wrote:
>
> > Use 3 jobs: 1 to scan each table. The third could do a map-side join.
> Make
> > sure to use the same sort and partitions on the first two.
> >
> > Sent from iPhone.
> >
> > On Aug 10, 2012, at 9:41 AM, Weishung Chung <weishung@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > but they are in production now
> > >
> > > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <weishung@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > >> Thank you, I am trying to avoid to fetch by gets and would like to do
> > >> something like hadoop MultipleInputs.
> > >> Yes, it would be nice if i could denormalize and remodel the schema.
> > >>
> > >>
> > >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <amansk@gmail.com
> <javascript:;>
> > >wrote:
> > >>
> > >>> You can scan over one of the tables (using TableInputFormat) and do
> > simple
> > >>> gets on the other table for every row that you want to join.
> > >>>
> > >>> An interesting question to address here would be - why even need a
> > join.
> > >>> Can you talk more about the data and what you are trying to do? In
> > general
> > >>> you really want to denormalize and not need joins when working with
> > HBase
> > >>> (or for that matter most NoSQL stores).
> > >>>
> > >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <weishung@gmail.com
> <javascript:;>
> > >
> > >>> wrote:
> > >>>
> > >>>> Basically a join of two data sets on the same row key.
> > >>>>
> > >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <amansk@gmail.com
> <javascript:;>
> > >
> > >>>> wrote:
> > >>>>
> > >>>>> How do you want to use two tables? Can you explain your algo a bit?
> > >>>>>
> > >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <
> weishung@gmail.com<javascript:;>
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi HBase users,
> > >>>>>>
> > >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1
> > >>> table
> > >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there
> another
> > >>>>> method
> > >>>>>> for multitable inputs ?
> > >>>>>>
> > >>>>>> Thank you,
> > >>>>>> Wei Shung
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >>
> >
>

Re: multitable query

Posted by Jerry Lam <ch...@gmail.com>.
Hi Wei:

There is a jira Hbase-3996, does this sound something you are looking for?

Regards,

Jerry

On Friday, August 10, 2012, Bryan Beaudreault wrote:

> Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make
> sure to use the same sort and partitions on the first two.
>
> Sent from iPhone.
>
> On Aug 10, 2012, at 9:41 AM, Weishung Chung <weishung@gmail.com<javascript:;>>
> wrote:
>
> > but they are in production now
> >
> > On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <weishung@gmail.com<javascript:;>>
> wrote:
> >
> >> Thank you, I am trying to avoid to fetch by gets and would like to do
> >> something like hadoop MultipleInputs.
> >> Yes, it would be nice if i could denormalize and remodel the schema.
> >>
> >>
> >> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <amansk@gmail.com<javascript:;>
> >wrote:
> >>
> >>> You can scan over one of the tables (using TableInputFormat) and do
> simple
> >>> gets on the other table for every row that you want to join.
> >>>
> >>> An interesting question to address here would be - why even need a
> join.
> >>> Can you talk more about the data and what you are trying to do? In
> general
> >>> you really want to denormalize and not need joins when working with
> HBase
> >>> (or for that matter most NoSQL stores).
> >>>
> >>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <weishung@gmail.com<javascript:;>
> >
> >>> wrote:
> >>>
> >>>> Basically a join of two data sets on the same row key.
> >>>>
> >>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <amansk@gmail.com<javascript:;>
> >
> >>>> wrote:
> >>>>
> >>>>> How do you want to use two tables? Can you explain your algo a bit?
> >>>>>
> >>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <weishung@gmail.com<javascript:;>
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Hi HBase users,
> >>>>>>
> >>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1
> >>> table
> >>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there another
> >>>>> method
> >>>>>> for multitable inputs ?
> >>>>>>
> >>>>>> Thank you,
> >>>>>> Wei Shung
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
>

Re: multitable query

Posted by Bryan Beaudreault <bb...@hubspot.com>.
Use 3 jobs: 1 to scan each table. The third could do a map-side join. Make sure to use the same sort and partitions on the first two.

Sent from iPhone.

On Aug 10, 2012, at 9:41 AM, Weishung Chung <we...@gmail.com> wrote:

> but they are in production now
> 
> On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <we...@gmail.com> wrote:
> 
>> Thank you, I am trying to avoid to fetch by gets and would like to do
>> something like hadoop MultipleInputs.
>> Yes, it would be nice if i could denormalize and remodel the schema.
>> 
>> 
>> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <am...@gmail.com>wrote:
>> 
>>> You can scan over one of the tables (using TableInputFormat) and do simple
>>> gets on the other table for every row that you want to join.
>>> 
>>> An interesting question to address here would be - why even need a join.
>>> Can you talk more about the data and what you are trying to do? In general
>>> you really want to denormalize and not need joins when working with HBase
>>> (or for that matter most NoSQL stores).
>>> 
>>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <we...@gmail.com>
>>> wrote:
>>> 
>>>> Basically a join of two data sets on the same row key.
>>>> 
>>>> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <am...@gmail.com>
>>>> wrote:
>>>> 
>>>>> How do you want to use two tables? Can you explain your algo a bit?
>>>>> 
>>>>> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <we...@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Hi HBase users,
>>>>>> 
>>>>>> I need to pull data from 2 HBase tables in a mapreduce job. For 1
>>> table
>>>>>> input, I use TableMapReduceUtil.initTableMapperJob. Is there another
>>>>> method
>>>>>> for multitable inputs ?
>>>>>> 
>>>>>> Thank you,
>>>>>> Wei Shung
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 

Re: multitable query

Posted by Weishung Chung <we...@gmail.com>.
but they are in production now

On Fri, Aug 10, 2012 at 6:39 AM, Weishung Chung <we...@gmail.com> wrote:

> Thank you, I am trying to avoid to fetch by gets and would like to do
> something like hadoop MultipleInputs.
> Yes, it would be nice if i could denormalize and remodel the schema.
>
>
> On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <am...@gmail.com>wrote:
>
>> You can scan over one of the tables (using TableInputFormat) and do simple
>> gets on the other table for every row that you want to join.
>>
>> An interesting question to address here would be - why even need a join.
>> Can you talk more about the data and what you are trying to do? In general
>> you really want to denormalize and not need joins when working with HBase
>> (or for that matter most NoSQL stores).
>>
>> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <we...@gmail.com>
>> wrote:
>>
>> > Basically a join of two data sets on the same row key.
>> >
>> > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <am...@gmail.com>
>> > wrote:
>> >
>> > > How do you want to use two tables? Can you explain your algo a bit?
>> > >
>> > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <we...@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi HBase users,
>> > > >
>> > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1
>> table
>> > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another
>> > > method
>> > > > for multitable inputs ?
>> > > >
>> > > > Thank you,
>> > > > Wei Shung
>> > > >
>> > >
>> >
>>
>
>

Re: multitable query

Posted by Weishung Chung <we...@gmail.com>.
Thank you, I am trying to avoid to fetch by gets and would like to do
something like hadoop MultipleInputs.
Yes, it would be nice if i could denormalize and remodel the schema.

On Fri, Aug 10, 2012 at 6:29 AM, Amandeep Khurana <am...@gmail.com> wrote:

> You can scan over one of the tables (using TableInputFormat) and do simple
> gets on the other table for every row that you want to join.
>
> An interesting question to address here would be - why even need a join.
> Can you talk more about the data and what you are trying to do? In general
> you really want to denormalize and not need joins when working with HBase
> (or for that matter most NoSQL stores).
>
> On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Basically a join of two data sets on the same row key.
> >
> > On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <am...@gmail.com>
> > wrote:
> >
> > > How do you want to use two tables? Can you explain your algo a bit?
> > >
> > > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <we...@gmail.com>
> > > wrote:
> > >
> > > > Hi HBase users,
> > > >
> > > > I need to pull data from 2 HBase tables in a mapreduce job. For 1
> table
> > > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another
> > > method
> > > > for multitable inputs ?
> > > >
> > > > Thank you,
> > > > Wei Shung
> > > >
> > >
> >
>

Re: multitable query

Posted by Amandeep Khurana <am...@gmail.com>.
You can scan over one of the tables (using TableInputFormat) and do simple
gets on the other table for every row that you want to join.

An interesting question to address here would be - why even need a join.
Can you talk more about the data and what you are trying to do? In general
you really want to denormalize and not need joins when working with HBase
(or for that matter most NoSQL stores).

On Fri, Aug 10, 2012 at 6:52 PM, Weishung Chung <we...@gmail.com> wrote:

> Basically a join of two data sets on the same row key.
>
> On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <am...@gmail.com>
> wrote:
>
> > How do you want to use two tables? Can you explain your algo a bit?
> >
> > On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <we...@gmail.com>
> > wrote:
> >
> > > Hi HBase users,
> > >
> > > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table
> > > input, I use TableMapReduceUtil.initTableMapperJob. Is there another
> > method
> > > for multitable inputs ?
> > >
> > > Thank you,
> > > Wei Shung
> > >
> >
>

Re: multitable query

Posted by Weishung Chung <we...@gmail.com>.
Basically a join of two data sets on the same row key.

On Fri, Aug 10, 2012 at 6:12 AM, Amandeep Khurana <am...@gmail.com> wrote:

> How do you want to use two tables? Can you explain your algo a bit?
>
> On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Hi HBase users,
> >
> > I need to pull data from 2 HBase tables in a mapreduce job. For 1 table
> > input, I use TableMapReduceUtil.initTableMapperJob. Is there another
> method
> > for multitable inputs ?
> >
> > Thank you,
> > Wei Shung
> >
>

Re: multitable query

Posted by Amandeep Khurana <am...@gmail.com>.
How do you want to use two tables? Can you explain your algo a bit?

On Fri, Aug 10, 2012 at 6:40 PM, Weishung Chung <we...@gmail.com> wrote:

> Hi HBase users,
>
> I need to pull data from 2 HBase tables in a mapreduce job. For 1 table
> input, I use TableMapReduceUtil.initTableMapperJob. Is there another method
> for multitable inputs ?
>
> Thank you,
> Wei Shung
>