You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Aji Janis <aj...@gmail.com> on 2013/03/08 19:11:39 UTC

Mappers for Accumulo

Hello,

 I am trying to figure out how I can configure number of mappers (if its
even possible) based on a Accumulo row range. My accumulo rowid uses the
format:

abc/1
abc/2
...
def/3
....
xyz/13...

If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def 5] ,
[jkl/13 to klm 15]. and have one mapper work on one range, is there a way I
can do this?? How do I even set up my mapreduce job to accept these
ranges??? Thankyou for all feedback.

Re: Mappers for Accumulo

Posted by William Slacum <wi...@accumulo.net>.
Depending on the size of the tablet, you can lower the split threshold
and/or set new split points on the table.

On Mon, Mar 11, 2013 at 5:39 PM, Aji Janis <aj...@gmail.com> wrote:

> So we realized that all my data for the table of interest fits onto one
> tablet (HUGE tablet isn't it) ie we always had ONE mapper. So we said lets
> split the table by range so now we can have more mappers. So the next
> problem is  what if someone puts in start range as first row and end range
> as last row..... now I am back to One mapper. So what i need is some way to
> take in a range and split into a List<Range>.
>
>
>
> On Mon, Mar 11, 2013 at 5:13 PM, William Slacum <
> wilhelm.von.cloud@accumulo.net> wrote:
>
>> So you want both auto adjusting and not auto adjusting depending on the
>> size of a range? I suppose you could lift the code for doing the adjusting,
>> and do some introspection on the ranges (such as "how may tablets do I have
>> in this range?") and apply as necessary.
>>
>>
>> On Mon, Mar 11, 2013 at 4:47 PM, Aji Janis <aj...@gmail.com> wrote:
>>
>>> So looks like doing a List<Range> is what I need so that I can have a
>>> mapper per range. However, a more interesting scenario is one when given a
>>> big range I want to split it into multiple ranges. In other words if my
>>> rowid was 1_hello, 2_hello, .... 9_hello, 10_hello. And the range given was
>>> 2 to 5. But i want one mapper per integer so 4 mappers in this case... any
>>> ideas on how I can accomplish that?
>>>
>>>
>>> Thanks all for suggestions.
>>>
>>>
>>> On Fri, Mar 8, 2013 at 7:02 PM, Keith Turner <ke...@deenlo.com> wrote:
>>>
>>>> On Fri, Mar 8, 2013 at 4:17 PM, Aji Janis <aj...@gmail.com> wrote:
>>>> > Thank you. Follow up question.
>>>> >
>>>> > Would this enforce one mapper per range even if all the data (From
>>>> three
>>>> > ranges) is on one node/tablet?
>>>>
>>>> Look at disableAutoAdjustRanges(). This determines wether it creates a
>>>> mapper per tablet per range OR per range.
>>>>
>>>>
>>>> >
>>>> >
>>>> >
>>>> > On Fri, Mar 8, 2013 at 1:17 PM, Mike Hugo <mi...@piragua.com> wrote:
>>>> >>
>>>> >> See AccumuloInputFormat
>>>> >>
>>>> >> ArrayList<Range> ranges = new ArrayList<Range>();
>>>> >> // populate array list of row ranges ...
>>>> >> AccumuloInputFormat.setRanges(job, ranges);
>>>> >>
>>>> >>
>>>> >> You should get one mapper per range.
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <aj...@gmail.com>
>>>> wrote:
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>>  I am trying to figure out how I can configure number of mappers
>>>> (if its
>>>> >>> even possible) based on a Accumulo row range. My accumulo rowid
>>>> uses the
>>>> >>> format:
>>>> >>>
>>>> >>> abc/1
>>>> >>> abc/2
>>>> >>> ...
>>>> >>> def/3
>>>> >>> ....
>>>> >>> xyz/13...
>>>> >>>
>>>> >>> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def
>>>> 5] ,
>>>> >>> [jkl/13 to klm 15]. and have one mapper work on one range, is there
>>>> a way I
>>>> >>> can do this?? How do I even set up my mapreduce job to accept these
>>>> >>> ranges??? Thankyou for all feedback.
>>>> >>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>>
>

Re: Mappers for Accumulo

Posted by Aji Janis <aj...@gmail.com>.
So we realized that all my data for the table of interest fits onto one
tablet (HUGE tablet isn't it) ie we always had ONE mapper. So we said lets
split the table by range so now we can have more mappers. So the next
problem is  what if someone puts in start range as first row and end range
as last row..... now I am back to One mapper. So what i need is some way to
take in a range and split into a List<Range>.



On Mon, Mar 11, 2013 at 5:13 PM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> So you want both auto adjusting and not auto adjusting depending on the
> size of a range? I suppose you could lift the code for doing the adjusting,
> and do some introspection on the ranges (such as "how may tablets do I have
> in this range?") and apply as necessary.
>
>
> On Mon, Mar 11, 2013 at 4:47 PM, Aji Janis <aj...@gmail.com> wrote:
>
>> So looks like doing a List<Range> is what I need so that I can have a
>> mapper per range. However, a more interesting scenario is one when given a
>> big range I want to split it into multiple ranges. In other words if my
>> rowid was 1_hello, 2_hello, .... 9_hello, 10_hello. And the range given was
>> 2 to 5. But i want one mapper per integer so 4 mappers in this case... any
>> ideas on how I can accomplish that?
>>
>>
>> Thanks all for suggestions.
>>
>>
>> On Fri, Mar 8, 2013 at 7:02 PM, Keith Turner <ke...@deenlo.com> wrote:
>>
>>> On Fri, Mar 8, 2013 at 4:17 PM, Aji Janis <aj...@gmail.com> wrote:
>>> > Thank you. Follow up question.
>>> >
>>> > Would this enforce one mapper per range even if all the data (From
>>> three
>>> > ranges) is on one node/tablet?
>>>
>>> Look at disableAutoAdjustRanges(). This determines wether it creates a
>>> mapper per tablet per range OR per range.
>>>
>>>
>>> >
>>> >
>>> >
>>> > On Fri, Mar 8, 2013 at 1:17 PM, Mike Hugo <mi...@piragua.com> wrote:
>>> >>
>>> >> See AccumuloInputFormat
>>> >>
>>> >> ArrayList<Range> ranges = new ArrayList<Range>();
>>> >> // populate array list of row ranges ...
>>> >> AccumuloInputFormat.setRanges(job, ranges);
>>> >>
>>> >>
>>> >> You should get one mapper per range.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <aj...@gmail.com> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>>  I am trying to figure out how I can configure number of mappers (if
>>> its
>>> >>> even possible) based on a Accumulo row range. My accumulo rowid uses
>>> the
>>> >>> format:
>>> >>>
>>> >>> abc/1
>>> >>> abc/2
>>> >>> ...
>>> >>> def/3
>>> >>> ....
>>> >>> xyz/13...
>>> >>>
>>> >>> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def
>>> 5] ,
>>> >>> [jkl/13 to klm 15]. and have one mapper work on one range, is there
>>> a way I
>>> >>> can do this?? How do I even set up my mapreduce job to accept these
>>> >>> ranges??? Thankyou for all feedback.
>>> >>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Re: Mappers for Accumulo

Posted by William Slacum <wi...@accumulo.net>.
So you want both auto adjusting and not auto adjusting depending on the
size of a range? I suppose you could lift the code for doing the adjusting,
and do some introspection on the ranges (such as "how may tablets do I have
in this range?") and apply as necessary.

On Mon, Mar 11, 2013 at 4:47 PM, Aji Janis <aj...@gmail.com> wrote:

> So looks like doing a List<Range> is what I need so that I can have a
> mapper per range. However, a more interesting scenario is one when given a
> big range I want to split it into multiple ranges. In other words if my
> rowid was 1_hello, 2_hello, .... 9_hello, 10_hello. And the range given was
> 2 to 5. But i want one mapper per integer so 4 mappers in this case... any
> ideas on how I can accomplish that?
>
>
> Thanks all for suggestions.
>
>
> On Fri, Mar 8, 2013 at 7:02 PM, Keith Turner <ke...@deenlo.com> wrote:
>
>> On Fri, Mar 8, 2013 at 4:17 PM, Aji Janis <aj...@gmail.com> wrote:
>> > Thank you. Follow up question.
>> >
>> > Would this enforce one mapper per range even if all the data (From three
>> > ranges) is on one node/tablet?
>>
>> Look at disableAutoAdjustRanges(). This determines wether it creates a
>> mapper per tablet per range OR per range.
>>
>>
>> >
>> >
>> >
>> > On Fri, Mar 8, 2013 at 1:17 PM, Mike Hugo <mi...@piragua.com> wrote:
>> >>
>> >> See AccumuloInputFormat
>> >>
>> >> ArrayList<Range> ranges = new ArrayList<Range>();
>> >> // populate array list of row ranges ...
>> >> AccumuloInputFormat.setRanges(job, ranges);
>> >>
>> >>
>> >> You should get one mapper per range.
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <aj...@gmail.com> wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>>  I am trying to figure out how I can configure number of mappers (if
>> its
>> >>> even possible) based on a Accumulo row range. My accumulo rowid uses
>> the
>> >>> format:
>> >>>
>> >>> abc/1
>> >>> abc/2
>> >>> ...
>> >>> def/3
>> >>> ....
>> >>> xyz/13...
>> >>>
>> >>> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def
>> 5] ,
>> >>> [jkl/13 to klm 15]. and have one mapper work on one range, is there a
>> way I
>> >>> can do this?? How do I even set up my mapreduce job to accept these
>> >>> ranges??? Thankyou for all feedback.
>> >>>
>> >>>
>> >>
>> >
>>
>
>

Re: Mappers for Accumulo

Posted by Aji Janis <aj...@gmail.com>.
So looks like doing a List<Range> is what I need so that I can have a
mapper per range. However, a more interesting scenario is one when given a
big range I want to split it into multiple ranges. In other words if my
rowid was 1_hello, 2_hello, .... 9_hello, 10_hello. And the range given was
2 to 5. But i want one mapper per integer so 4 mappers in this case... any
ideas on how I can accomplish that?


Thanks all for suggestions.

On Fri, Mar 8, 2013 at 7:02 PM, Keith Turner <ke...@deenlo.com> wrote:

> On Fri, Mar 8, 2013 at 4:17 PM, Aji Janis <aj...@gmail.com> wrote:
> > Thank you. Follow up question.
> >
> > Would this enforce one mapper per range even if all the data (From three
> > ranges) is on one node/tablet?
>
> Look at disableAutoAdjustRanges(). This determines wether it creates a
> mapper per tablet per range OR per range.
>
>
> >
> >
> >
> > On Fri, Mar 8, 2013 at 1:17 PM, Mike Hugo <mi...@piragua.com> wrote:
> >>
> >> See AccumuloInputFormat
> >>
> >> ArrayList<Range> ranges = new ArrayList<Range>();
> >> // populate array list of row ranges ...
> >> AccumuloInputFormat.setRanges(job, ranges);
> >>
> >>
> >> You should get one mapper per range.
> >>
> >>
> >>
> >>
> >> On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <aj...@gmail.com> wrote:
> >>>
> >>> Hello,
> >>>
> >>>  I am trying to figure out how I can configure number of mappers (if
> its
> >>> even possible) based on a Accumulo row range. My accumulo rowid uses
> the
> >>> format:
> >>>
> >>> abc/1
> >>> abc/2
> >>> ...
> >>> def/3
> >>> ....
> >>> xyz/13...
> >>>
> >>> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def 5]
> ,
> >>> [jkl/13 to klm 15]. and have one mapper work on one range, is there a
> way I
> >>> can do this?? How do I even set up my mapreduce job to accept these
> >>> ranges??? Thankyou for all feedback.
> >>>
> >>>
> >>
> >
>

Re: Mappers for Accumulo

Posted by Keith Turner <ke...@deenlo.com>.
On Fri, Mar 8, 2013 at 4:17 PM, Aji Janis <aj...@gmail.com> wrote:
> Thank you. Follow up question.
>
> Would this enforce one mapper per range even if all the data (From three
> ranges) is on one node/tablet?

Look at disableAutoAdjustRanges(). This determines wether it creates a
mapper per tablet per range OR per range.


>
>
>
> On Fri, Mar 8, 2013 at 1:17 PM, Mike Hugo <mi...@piragua.com> wrote:
>>
>> See AccumuloInputFormat
>>
>> ArrayList<Range> ranges = new ArrayList<Range>();
>> // populate array list of row ranges ...
>> AccumuloInputFormat.setRanges(job, ranges);
>>
>>
>> You should get one mapper per range.
>>
>>
>>
>>
>> On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <aj...@gmail.com> wrote:
>>>
>>> Hello,
>>>
>>>  I am trying to figure out how I can configure number of mappers (if its
>>> even possible) based on a Accumulo row range. My accumulo rowid uses the
>>> format:
>>>
>>> abc/1
>>> abc/2
>>> ...
>>> def/3
>>> ....
>>> xyz/13...
>>>
>>> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def 5] ,
>>> [jkl/13 to klm 15]. and have one mapper work on one range, is there a way I
>>> can do this?? How do I even set up my mapreduce job to accept these
>>> ranges??? Thankyou for all feedback.
>>>
>>>
>>
>

Re: Mappers for Accumulo

Posted by Aji Janis <aj...@gmail.com>.
Thank you. Follow up question.

Would this enforce one mapper per range even if all the data (From three
ranges) is on one node/tablet?



On Fri, Mar 8, 2013 at 1:17 PM, Mike Hugo <mi...@piragua.com> wrote:

> See AccumuloInputFormat
>
> ArrayList<Range> ranges = new ArrayList<Range>();// populate array list of row ranges ...AccumuloInputFormat.setRanges(job, ranges);
>
>
> You should get one mapper per range.
>
>
>
>
> On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <aj...@gmail.com> wrote:
>
>> Hello,
>>
>>  I am trying to figure out how I can configure number of mappers (if its
>> even possible) based on a Accumulo row range. My accumulo rowid uses the
>> format:
>>
>> abc/1
>> abc/2
>> ...
>> def/3
>> ....
>> xyz/13...
>>
>> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def 5] ,
>> [jkl/13 to klm 15]. and have one mapper work on one range, is there a way I
>> can do this?? How do I even set up my mapreduce job to accept these
>> ranges??? Thankyou for all feedback.
>>
>>
>>
>

Re: Mappers for Accumulo

Posted by Mike Hugo <mi...@piragua.com>.
See AccumuloInputFormat

ArrayList<Range> ranges = new ArrayList<Range>();// populate array
list of row ranges ...AccumuloInputFormat.setRanges(job, ranges);


You should get one mapper per range.




On Fri, Mar 8, 2013 at 12:11 PM, Aji Janis <aj...@gmail.com> wrote:

> Hello,
>
>  I am trying to figure out how I can configure number of mappers (if its
> even possible) based on a Accumulo row range. My accumulo rowid uses the
> format:
>
> abc/1
> abc/2
> ...
> def/3
> ....
> xyz/13...
>
> If I want to specify three ranges: [abc/1 to abc/3] , [def/1 to def 5] ,
> [jkl/13 to klm 15]. and have one mapper work on one range, is there a way I
> can do this?? How do I even set up my mapreduce job to accept these
> ranges??? Thankyou for all feedback.
>
>
>