You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by Binglin Chang <de...@gmail.com> on 2016/03/17 04:42:44 UTC
Re: confusion about " the primary key intervals of different RowSets
may intersect"
How can this be "bootstrapped"?
At beginning, there is no DRS, only one MRS.
It's hard to do pre-split DRS, if you don't know distribution, and key
distribution may change along time.
On Thu, Mar 17, 2016 at 11:36 AM, 曾 杰南 <ze...@hotmail.com> wrote:
>
> Hi all:
> I learn Kudu's paper "Kudu: Storage for Fast Analytics on Fast Data" very
> hard to find
> why performance of hbase' random query is superior to kudu. "the primary
> key intervals
> of different RowSets may intersect" may be one of the reasons.
>
> My confusion is why not keep DiskRowSets ordered on primary key globally.
> When flush MemRowSet,
> the rows of MemRowSet dispatch to deltaMemStore of correspanding
> DiskRowSets. And negative side
> effects is fragment of DiskRowSets, but it is worth for globally orderd of
> DiskRowSets.
>
> best
> jie
>
>
Re: confusion about " the primary key intervals of different RowSets
may intersect"
Posted by 曾 杰南 <ze...@hotmail.com>.
We need't pre-split DRS.
for example:
roll DRSsize after it's size > 3 (row)
row5 means row with primary key 5
1. init
MRS: {}
DRS0(min, max): {}
2. insert three rows
MRS: {row5, row7, row8}
DRS0(min, max): {}
2. flush
MRS: {}
DRS0(min, max): {row5, row6, row8}
3. insert
MRS: {row1, row9, row10, row11}
DRS0(min, max): {row5, row6, row8}
4. flush && split
MRS: {}
DRS0(min, max): {row1,row5, row6, row8, row9, row10, row11} ->
DRS0(min, 6]:{row1, row5, row6} DSR1(6, 10]:{row8, row9, row10} DSR2(11, max):{row11}
negative side effects:
1.fragment of DiskRowSets
2. redo log split is complicated
于 2016年03月17日 11:42, Binglin Chang 写道:
How can this be "bootstrapped"?
At beginning, there is no DRS, only one MRS.
It's hard to do pre-split DRS, if you don't know distribution, and key
distribution may change along time.
On Thu, Mar 17, 2016 at 11:36 AM, 曾 杰南 <ze...@hotmail.com> wrote:
Hi all:
I learn Kudu's paper "Kudu: Storage for Fast Analytics on Fast Data" very
hard to find
why performance of hbase' random query is superior to kudu. "the primary
key intervals
of different RowSets may intersect" may be one of the reasons.
My confusion is why not keep DiskRowSets ordered on primary key globally.
When flush MemRowSet,
the rows of MemRowSet dispatch to deltaMemStore of correspanding
DiskRowSets. And negative side
effects is fragment of DiskRowSets, but it is worth for globally orderd of
DiskRowSets.
best
jie
Re: confusion about " the primary key intervals of different RowSets
may intersect"
Posted by Binglin Chang <de...@gmail.com>.
Not an expert here, but my guess auto split & merge DRS is not available
right now and probably will be very complex. so compaction approach seems
more simple and reasonable?
Re: confusion about " the primary key intervals of different RowSets
may intersect"
Posted by 曾 杰南 <ze...@hotmail.com>.
We need't pre-split DRS.
for example:
roll DRSsize after it's size > 3 (row)
row5 means row with primary key 5
1. init
MRS: {}
DRS0(min, max): {}
2. insert three rows
MRS: {row5, row7, row8}
DRS0(min, max): {}
2. flush
MRS: {}
DRS0(min, max): {row5, row6, row8}
3. insert
MRS: {row1, row9, row10, row11}
DRS0(min, max): {row5, row6, row8}
4. flush && split
MRS: {}
DRS0(min, max): {row1,row5, row6, row8, row9, row10, row11} ->
DRS0(min, 6]:{row1, row5, row6} DSR1(6, 10]:{row8, row9, row10} DSR2(11, max):{row11}
negative side effects:
1.fragment of DiskRowSets
2. redo log split is complicated
于 2016年03月17日 11:42, Binglin Chang 写道:
How can this be "bootstrapped"?
At beginning, there is no DRS, only one MRS.
It's hard to do pre-split DRS, if you don't know distribution, and key
distribution may change along time.
On Thu, Mar 17, 2016 at 11:36 AM, 曾 杰南 <ze...@hotmail.com> wrote:
Hi all:
I learn Kudu's paper "Kudu: Storage for Fast Analytics on Fast Data" very
hard to find
why performance of hbase' random query is superior to kudu. "the primary
key intervals
of different RowSets may intersect" may be one of the reasons.
My confusion is why not keep DiskRowSets ordered on primary key globally.
When flush MemRowSet,
the rows of MemRowSet dispatch to deltaMemStore of correspanding
DiskRowSets. And negative side
effects is fragment of DiskRowSets, but it is worth for globally orderd of
DiskRowSets.
best
jie