You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@accumulo.apache.org by Sean Busbey <bu...@cloudera.com> on 2014/06/20 20:58:28 UTC

better presplitting

One thing that jumped out from the most recent D4M paper was this quote:

  One issue that was encountered is that after creating the pre-splits,
they all started out on one server. Accumulo load balanced the splits
across its servers at rate of ~50 splits/second, which is more than
adequate for normal operation, but can take ~20 minutes for 50,000 pre-
splits.[1]

Do we already have an open ticket that would help this? I think maybe
there's one about being able to presplit a table that is offline?

I believe our recommended sweet spot is like 100-200 tablets per server
(though I can't find the reference for *why* I believe this ATM), which
means for clusters in the ~100s of nodes this would be in the ballpark for
an expected number of pre-splits.


[1]:  arXiv:1406.4923v1 [cs.DB]

-- 
Sean

Re: better presplitting

Posted by David Medinets <da...@gmail.com>.

Would it make sense for the test framework to wait for, say, 500
tablet to move off the first server and then ingesting specifically on
those servers? Is it better for the test framework to wait for all
tablets to migrate?

On Fri, Jun 20, 2014 at 3:41 PM, Sean Busbey <bu...@cloudera.com> wrote:
> When you add splits, they definitely start out on the server that is
> hosting the tablet that has to split apart.  They have to, since the tablet
> that hosted the previous key extent is the only one that can properly
> handle requests for the new key extents.
>
> We've run into this consistently when doing any testing that requires
> pre-splitting for perf reasons.
>
> In the case of YCSB tests, Mike scripted some nice manual pre-splitting in
> waves:
>
> * split table into X parts
> * wait for balancing
> * split each X part into Y parts
> * wait for balancing
>
> presuming the goal is to end up with X*Y presplits, this was way faster
> than just asking for the total right off the bat.
>
> We could generally look at improving the migration code to handle these
> reassignments faster, but how often does this situation come up for people
> who aren't making a new table? If the "do this offline" feature speeds up
> the new table use case enough, I'm not sure optimizing the migration path
> is worth the time investment right now.
>
>
> On Fri, Jun 20, 2014 at 3:09 PM, Josh Elser <jo...@gmail.com> wrote:
>
>> bq. They all started out on one server
>>
>> This seems.. weird. Would be good to start addressing this by identifying
>> what the actual balancer code does so we can immediately start to test the
>> assertions. We can then use the results to identify the deficiencies that
>> exist.
>>
>> I think the 200splits per server was an Eric quote from some time ago
>> (1.4-ish, maybe 1.5). I think this is relative to a bunch of things,
>> workload and memory available most notably, and would be good to quantify
>> too.
>>
>>
>> On 6/20/14, 11:58 AM, Sean Busbey wrote:
>>
>>> One thing that jumped out from the most recent D4M paper was this quote:
>>>
>>>    One issue that was encountered is that after creating the pre-splits,
>>> they all started out on one server. Accumulo load balanced the splits
>>> across its servers at rate of ~50 splits/second, which is more than
>>> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
>>> splits.[1]
>>>
>>> Do we already have an open ticket that would help this? I think maybe
>>> there's one about being able to presplit a table that is offline?
>>>
>>> I believe our recommended sweet spot is like 100-200 tablets per server
>>> (though I can't find the reference for *why* I believe this ATM), which
>>> means for clusters in the ~100s of nodes this would be in the ballpark for
>>> an expected number of pre-splits.
>>>
>>>
>>> [1]:  arXiv:1406.4923v1 [cs.DB]
>>>
>>>
>
>
> --
> Sean

Re: better presplitting

Posted by Josh Elser <jo...@gmail.com>.

On Jun 20, 2014 12:41 PM, "Sean Busbey" <bu...@cloudera.com> wrote:
>
> When you add splits, they definitely start out on the server that is
> hosting the tablet that has to split apart.  They have to, since the
tablet
> that hosted the previous key extent is the only one that can properly
> handle requests for the new key extents.
>
> We've run into this consistently when doing any testing that requires
> pre-splitting for perf reasons.

I'd have to pull up the split code, but it seems like a simple fix could be
to let all but one result of the split of a tablet remain local. That way
the current server doesn't get bogged down, and the master would just use
the regular assignment path instead of waiting for the balancer to kick in.

Maybe there's a reason this doesn't work though :)

> In the case of YCSB tests, Mike scripted some nice manual pre-splitting in
> waves:
>
> * split table into X parts
> * wait for balancing
> * split each X part into Y parts
> * wait for balancing
>
> presuming the goal is to end up with X*Y presplits, this was way faster
> than just asking for the total right off the bat.
>
> We could generally look at improving the migration code to handle these
> reassignments faster, but how often does this situation come up for people
> who aren't making a new table? If the "do this offline" feature speeds up
> the new table use case enough, I'm not sure optimizing the migration path
> is worth the time investment right now.
>
>
> On Fri, Jun 20, 2014 at 3:09 PM, Josh Elser <jo...@gmail.com> wrote:
>
> > bq. They all started out on one server
> >
> > This seems.. weird. Would be good to start addressing this by
identifying
> > what the actual balancer code does so we can immediately start to test
the
> > assertions. We can then use the results to identify the deficiencies
that
> > exist.
> >
> > I think the 200splits per server was an Eric quote from some time ago
> > (1.4-ish, maybe 1.5). I think this is relative to a bunch of things,
> > workload and memory available most notably, and would be good to
quantify
> > too.
> >
> >
> > On 6/20/14, 11:58 AM, Sean Busbey wrote:
> >
> >> One thing that jumped out from the most recent D4M paper was this
quote:
> >>
> >>    One issue that was encountered is that after creating the
pre-splits,
> >> they all started out on one server. Accumulo load balanced the splits
> >> across its servers at rate of ~50 splits/second, which is more than
> >> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
> >> splits.[1]
> >>
> >> Do we already have an open ticket that would help this? I think maybe
> >> there's one about being able to presplit a table that is offline?
> >>
> >> I believe our recommended sweet spot is like 100-200 tablets per server
> >> (though I can't find the reference for *why* I believe this ATM), which
> >> means for clusters in the ~100s of nodes this would be in the ballpark
for
> >> an expected number of pre-splits.
> >>
> >>
> >> [1]:  arXiv:1406.4923v1 [cs.DB]
> >>
> >>
>
>
> --
> Sean

Re: better presplitting

Posted by Sean Busbey <bu...@cloudera.com>.

When you add splits, they definitely start out on the server that is
hosting the tablet that has to split apart.  They have to, since the tablet
that hosted the previous key extent is the only one that can properly
handle requests for the new key extents.

We've run into this consistently when doing any testing that requires
pre-splitting for perf reasons.

In the case of YCSB tests, Mike scripted some nice manual pre-splitting in
waves:

* split table into X parts
* wait for balancing
* split each X part into Y parts
* wait for balancing

presuming the goal is to end up with X*Y presplits, this was way faster
than just asking for the total right off the bat.

We could generally look at improving the migration code to handle these
reassignments faster, but how often does this situation come up for people
who aren't making a new table? If the "do this offline" feature speeds up
the new table use case enough, I'm not sure optimizing the migration path
is worth the time investment right now.

On Fri, Jun 20, 2014 at 3:09 PM, Josh Elser <jo...@gmail.com> wrote:

> bq. They all started out on one server
>
> This seems.. weird. Would be good to start addressing this by identifying
> what the actual balancer code does so we can immediately start to test the
> assertions. We can then use the results to identify the deficiencies that
> exist.
>
> I think the 200splits per server was an Eric quote from some time ago
> (1.4-ish, maybe 1.5). I think this is relative to a bunch of things,
> workload and memory available most notably, and would be good to quantify
> too.
>
>
> On 6/20/14, 11:58 AM, Sean Busbey wrote:
>
>> One thing that jumped out from the most recent D4M paper was this quote:
>>
>>    One issue that was encountered is that after creating the pre-splits,
>> they all started out on one server. Accumulo load balanced the splits
>> across its servers at rate of ~50 splits/second, which is more than
>> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
>> splits.[1]
>>
>> Do we already have an open ticket that would help this? I think maybe
>> there's one about being able to presplit a table that is offline?
>>
>> I believe our recommended sweet spot is like 100-200 tablets per server
>> (though I can't find the reference for *why* I believe this ATM), which
>> means for clusters in the ~100s of nodes this would be in the ballpark for
>> an expected number of pre-splits.
>>
>>
>> [1]:  arXiv:1406.4923v1 [cs.DB]
>>
>>

-- 
Sean

Re: better presplitting

Posted by Josh Elser <jo...@gmail.com>.

bq. They all started out on one server

This seems.. weird. Would be good to start addressing this by 
identifying what the actual balancer code does so we can immediately 
start to test the assertions. We can then use the results to identify 
the deficiencies that exist.

I think the 200splits per server was an Eric quote from some time ago 
(1.4-ish, maybe 1.5). I think this is relative to a bunch of things, 
workload and memory available most notably, and would be good to 
quantify too.

On 6/20/14, 11:58 AM, Sean Busbey wrote:
> One thing that jumped out from the most recent D4M paper was this quote:
>
>    One issue that was encountered is that after creating the pre-splits,
> they all started out on one server. Accumulo load balanced the splits
> across its servers at rate of ~50 splits/second, which is more than
> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
> splits.[1]
>
> Do we already have an open ticket that would help this? I think maybe
> there's one about being able to presplit a table that is offline?
>
> I believe our recommended sweet spot is like 100-200 tablets per server
> (though I can't find the reference for *why* I believe this ATM), which
> means for clusters in the ~100s of nodes this would be in the ballpark for
> an expected number of pre-splits.
>
>
> [1]:  arXiv:1406.4923v1 [cs.DB]
>

Re: better presplitting

Posted by Sean Busbey <bu...@cloudera.com>.

The offline presplit ticket is

https://issues.apache.org/jira/browse/ACCUMULO-2368




On Fri, Jun 20, 2014 at 2:58 PM, Sean Busbey <bu...@cloudera.com> wrote:

> One thing that jumped out from the most recent D4M paper was this quote:
>
>   One issue that was encountered is that after creating the pre-splits,
> they all started out on one server. Accumulo load balanced the splits
> across its servers at rate of ~50 splits/second, which is more than
> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
> splits.[1]
>
> Do we already have an open ticket that would help this? I think maybe
> there's one about being able to presplit a table that is offline?
>
> I believe our recommended sweet spot is like 100-200 tablets per server
> (though I can't find the reference for *why* I believe this ATM), which
> means for clusters in the ~100s of nodes this would be in the ballpark for
> an expected number of pre-splits.
>
>
> [1]:  arXiv:1406.4923v1 [cs.DB]
>
> --
> Sean
>



-- 
Sean

Re: better presplitting

Posted by Keith Turner <ke...@deenlo.com>.

On Wed, Jun 25, 2014 at 4:50 PM, Keith Turner <ke...@deenlo.com> wrote:

> I wrote a little utility to time splitting and subsequent balancing.  I
> will post some numbers from running this on EC2
>
> https://gist.github.com/keith-turner/5c561e438cb04c501b6e
>

posted some performance numbers on

https://issues.apache.org/jira/browse/ACCUMULO-2368


>
>
> On Fri, Jun 20, 2014 at 2:58 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>> One thing that jumped out from the most recent D4M paper was this quote:
>>
>>   One issue that was encountered is that after creating the pre-splits,
>> they all started out on one server. Accumulo load balanced the splits
>> across its servers at rate of ~50 splits/second, which is more than
>> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
>> splits.[1]
>>
>> Do we already have an open ticket that would help this? I think maybe
>> there's one about being able to presplit a table that is offline?
>>
>> I believe our recommended sweet spot is like 100-200 tablets per server
>> (though I can't find the reference for *why* I believe this ATM), which
>> means for clusters in the ~100s of nodes this would be in the ballpark for
>> an expected number of pre-splits.
>>
>>
>> [1]:  arXiv:1406.4923v1 [cs.DB]
>>
>> --
>> Sean
>>
>
>

Re: better presplitting

Posted by Keith Turner <ke...@deenlo.com>.

I wrote a little utility to time splitting and subsequent balancing.  I
will post some numbers from running this on EC2

https://gist.github.com/keith-turner/5c561e438cb04c501b6e


On Fri, Jun 20, 2014 at 2:58 PM, Sean Busbey <bu...@cloudera.com> wrote:

> One thing that jumped out from the most recent D4M paper was this quote:
>
>   One issue that was encountered is that after creating the pre-splits,
> they all started out on one server. Accumulo load balanced the splits
> across its servers at rate of ~50 splits/second, which is more than
> adequate for normal operation, but can take ~20 minutes for 50,000 pre-
> splits.[1]
>
> Do we already have an open ticket that would help this? I think maybe
> there's one about being able to presplit a table that is offline?
>
> I believe our recommended sweet spot is like 100-200 tablets per server
> (though I can't find the reference for *why* I believe this ATM), which
> means for clusters in the ~100s of nodes this would be in the ballpark for
> an expected number of pre-splits.
>
>
> [1]:  arXiv:1406.4923v1 [cs.DB]
>
> --
> Sean
>