You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by z11373 <z1...@outlook.com> on 2015/10/20 16:39:53 UTC

tablet split

As my understanding, Accumulo will have data already sorted with row id, and
if the number of rows is growing, it will split the tablet at one point.
For example, let say I have following row ids:

1_abcxxx
1_abdxxx
1_abexxx
1_abfxxx
1_abgxxx
1_abhxxx
1_abixxx
...
1_zzzxxx
2_abcxxx
2_abdxxx
2_abexxx
2_abfxxx
2_abgxxx
2_abhxxx
...

Let say the data with row id starts with "1_" has a million of rows, and for
sake of example, let say the tablet size is 400K, so in this case the "1_"
data will be split into 3 tablets.

My question is will Accumulo distribute those 3 tablets into different
tablet server nodes? Or perhaps two or all of them will remain in that
original tablet server?


Thanks,
Z




--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/tablet-split-tp15399.html
Sent from the Developers mailing list archive at Nabble.com.

Re: tablet split

Posted by z11373 <z1...@outlook.com>.
Thanks Josh!
Unfortunately, the monitor only shows which tablet servers where a specific
table's tablets are hosted, and number of entries. I actually need
additional information.
For example, current info shown on the monitor for table A (with 6 tablets):

Tablet Server 1:
# of tablets = 1, # of entries: 10.78M
Tablet Server 2:
# of tablets = 1, # of entries: 22.92M
Tablet Server 3:
# of tablets = 1, # of entries: 13.52M
Tablet Server 4:
# of tablets = 2, # of entries: 135.12M
Tablet Server 5:
# of tablets = 1, # of entries: 34.58M

The data being stored are something like (below is row id value):
1_xxxxxx
.....
.....
2_xxxxxx
.....
.....
3_xxxxxx
.....

There will be case which data with row id 1_xxxxx are ten times more than
2_xxxxx and 3_xxxxx,  which in that case, I am afraid it could be the tablet
server 4 with 135.12M entries (in example above) is where all rows with
'1_xxxxx', or very likely that Accumulo will distribute the data with same
row id prefix to other tablet servers?


Thanks,
Z





--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/tablet-split-tp15399p15568.html
Sent from the Developers mailing list archive at Nabble.com.

Re: tablet split

Posted by Josh Elser <jo...@gmail.com>.
Quickest check is to just look at the monitor. Click on your table and 
you should be able to drill down into where the tablets are being hosted.

z11373 wrote:
> Back to this thread again...
> If I have 1M of rows with "1_xxxx" data, and 5 tablet servers, how can I
> check if those one million rows got distributed to all tablet servers or
> not? If it doesn't, then I may have perf problem when the client reads the
> data.
>
>
> Thanks,
> Z
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/tablet-split-tp15399p15536.html
> Sent from the Developers mailing list archive at Nabble.com.

Re: tablet split

Posted by z11373 <z1...@outlook.com>.
Back to this thread again...
If I have 1M of rows with "1_xxxx" data, and 5 tablet servers, how can I
check if those one million rows got distributed to all tablet servers or
not? If it doesn't, then I may have perf problem when the client reads the
data.


Thanks,
Z



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/tablet-split-tp15399p15536.html
Sent from the Developers mailing list archive at Nabble.com.

Re: tablet split

Posted by Eric Newton <er...@gmail.com>.
Accumulo will balance the tablets based on the configured balancer.

Without getting down into the details, the splits will be moved to other
nodes.

The Details:

It depends. With the default balancer, it will try to smooth out the number
of tablets among servers, by table. So, if this table goes from 1 tablet to
3, and there are at least 3 servers, each split will eventually find itself
moved to separate server. But, if you add one split among hundreds, it may
not make much of a difference to bother moving the tablet.

-Eric


On Tue, Oct 20, 2015 at 10:39 AM, z11373 <z1...@outlook.com> wrote:

> As my understanding, Accumulo will have data already sorted with row id,
> and
> if the number of rows is growing, it will split the tablet at one point.
> For example, let say I have following row ids:
>
> 1_abcxxx
> 1_abdxxx
> 1_abexxx
> 1_abfxxx
> 1_abgxxx
> 1_abhxxx
> 1_abixxx
> ...
> 1_zzzxxx
> 2_abcxxx
> 2_abdxxx
> 2_abexxx
> 2_abfxxx
> 2_abgxxx
> 2_abhxxx
> ...
>
> Let say the data with row id starts with "1_" has a million of rows, and
> for
> sake of example, let say the tablet size is 400K, so in this case the "1_"
> data will be split into 3 tablets.
>
> My question is will Accumulo distribute those 3 tablets into different
> tablet server nodes? Or perhaps two or all of them will remain in that
> original tablet server?
>
>
> Thanks,
> Z
>
>
>
>
> --
> View this message in context:
> http://apache-accumulo.1065345.n5.nabble.com/tablet-split-tp15399.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Re: tablet split

Posted by Josh Elser <jo...@gmail.com>.
IIRC, either half of a split tablet will remain on the same node as the 
parent; however the next invocation of the configured balancer might 
move them per its policy.

z11373 wrote:
> As my understanding, Accumulo will have data already sorted with row id, and
> if the number of rows is growing, it will split the tablet at one point.
> For example, let say I have following row ids:
>
> 1_abcxxx
> 1_abdxxx
> 1_abexxx
> 1_abfxxx
> 1_abgxxx
> 1_abhxxx
> 1_abixxx
> ...
> 1_zzzxxx
> 2_abcxxx
> 2_abdxxx
> 2_abexxx
> 2_abfxxx
> 2_abgxxx
> 2_abhxxx
> ...
>
> Let say the data with row id starts with "1_" has a million of rows, and for
> sake of example, let say the tablet size is 400K, so in this case the "1_"
> data will be split into 3 tablets.
>
> My question is will Accumulo distribute those 3 tablets into different
> tablet server nodes? Or perhaps two or all of them will remain in that
> original tablet server?
>
>
> Thanks,
> Z
>
>
>
>
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/tablet-split-tp15399.html
> Sent from the Developers mailing list archive at Nabble.com.