You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Rakhi Khatwani <ra...@gmail.com> on 2009/04/21 12:19:29 UTC

Bulk read in a single map task.

Hi,
      I have a scanario,
       i have a table... which has 2 be read into say 'n' maps.
      so now in each map... i need 2 access say 'm' records at once... so
that i can spawn them using threads.. to increase parallel processing.
      is it feasible??? i am using hadoop 0.19.0 and hbase 0.19.0

Thanks
Raakhi

Re: Bulk read in a single map task.

Posted by Rakhi Khatwani <ra...@gmail.com>.

Thanks Stack
will try that tomorrow.

Regards,
Raakhi

On Wed, Apr 22, 2009 at 10:33 PM, stack <st...@duboce.net> wrote:

> On Wed, Apr 22, 2009 at 9:53 AM, Rakhi Khatwani <rakhi.khatwani@gmail.com
> >wrote:
>
> > Hi Stack,
> >              In the traditional scenario, an InputSplit is given to the
> map
> > and the map iterates through each of them sequentially right.
> > is there any way in which i can have 5 (for example) records in each map
> > iteration??
>
>
>
> If you can't change your database schema so rows have all you need per map,
> or if the Scanner.next(int count) won't work for you -- i.e. get 'count'
> items on each next invocation (perhaps this will work, I don't know, just
> saw it in Interface), then you might want to play w/
> org.apache.hadoop.mapred.MapRunner.  Its the thing that invokes maps.  You
> can subclass it and grab a bunch of rows and feed them all in a lump to an
> amended map.
>
>  St.Ack
>

Re: Bulk read in a single map task.

Posted by stack <st...@duboce.net>.

On Wed, Apr 22, 2009 at 9:53 AM, Rakhi Khatwani <ra...@gmail.com>wrote:

> Hi Stack,
>              In the traditional scenario, an InputSplit is given to the map
> and the map iterates through each of them sequentially right.
> is there any way in which i can have 5 (for example) records in each map
> iteration??

If you can't change your database schema so rows have all you need per map,
or if the Scanner.next(int count) won't work for you -- i.e. get 'count'
items on each next invocation (perhaps this will work, I don't know, just
saw it in Interface), then you might want to play w/
org.apache.hadoop.mapred.MapRunner.  Its the thing that invokes maps.  You
can subclass it and grab a bunch of rows and feed them all in a lump to an
amended map.

 St.Ack

Re: Bulk read in a single map task.

Posted by Rakhi Khatwani <ra...@gmail.com>.

Hi Stack,
              In the traditional scenario, an InputSplit is given to the map
and the map iterates through each of them sequentially right.
is there any way in which i can have 5 (for example) records in each map
iteration??
Sorry for not being very clear last time.

Thanks,
Rakhi

On Wed, Apr 22, 2009 at 10:13 PM, stack <st...@duboce.net> wrote:

> Sorry.  I'm having trouble following your question below.  Want to have
> another go at it?
> Thanks,
> St.Ack
>
> On Tue, Apr 21, 2009 at 3:19 AM, Rakhi Khatwani <rakhi.khatwani@gmail.com
> >wrote:
>
> > Hi,
> >      I have a scanario,
> >       i have a table... which has 2 be read into say 'n' maps.
> >      so now in each map... i need 2 access say 'm' records at once... so
> > that i can spawn them using threads.. to increase parallel processing.
> >      is it feasible??? i am using hadoop 0.19.0 and hbase 0.19.0
> >
> > Thanks
> > Raakhi
> >
>

Re: Bulk read in a single map task.

Posted by stack <st...@duboce.net>.

Sorry.  I'm having trouble following your question below.  Want to have
another go at it?
Thanks,
St.Ack

On Tue, Apr 21, 2009 at 3:19 AM, Rakhi Khatwani <ra...@gmail.com>wrote:

> Hi,
>      I have a scanario,
>       i have a table... which has 2 be read into say 'n' maps.
>      so now in each map... i need 2 access say 'm' records at once... so
> that i can spawn them using threads.. to increase parallel processing.
>      is it feasible??? i am using hadoop 0.19.0 and hbase 0.19.0
>
> Thanks
> Raakhi
>