You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kudu.apache.org by Binglin Chang <de...@gmail.com> on 2016/03/17 04:42:44 UTC

Re: confusion about " the primary key intervals of different RowSets may intersect"

How can this be "bootstrapped"?
At beginning, there is no DRS, only one MRS.
It's hard to do pre-split DRS, if you don't know distribution, and key
distribution may change along time.

On Thu, Mar 17, 2016 at 11:36 AM, 曾 杰南 <ze...@hotmail.com> wrote:

>
> Hi all:
> I learn Kudu's paper "Kudu: Storage for Fast Analytics on Fast Data" very
> hard to find
> why performance of hbase' random query is superior to kudu. "the primary
> key intervals
>  of different RowSets may intersect" may be one of the reasons.
>
> My confusion is why not keep DiskRowSets ordered on primary key globally.
> When flush MemRowSet,
> the rows of MemRowSet dispatch to deltaMemStore of correspanding
> DiskRowSets. And negative side
> effects is fragment of DiskRowSets, but it is worth for globally orderd of
> DiskRowSets.
>
> best
> jie
>
>

Re: confusion about " the primary key intervals of different RowSets may intersect"

Posted by 曾杰南 <ze...@hotmail.com>.

We need't pre-split DRS.
for example:
    roll DRSsize after it's size > 3 (row)
    row5 means row with primary key 5

1. init
     MRS: {}
     DRS0(min, max): {}
2. insert three rows
     MRS: {row5, row7, row8}
     DRS0(min, max): {}

2. flush
     MRS: {}
     DRS0(min, max): {row5, row6, row8}

3. insert
    MRS: {row1, row9, row10, row11}
    DRS0(min, max): {row5, row6, row8}

4. flush  && split
    MRS: {}
    DRS0(min, max): {row1,row5, row6, row8, row9, row10, row11}  ->
        DRS0(min, 6]:{row1, row5, row6}  DSR1(6, 10]:{row8, row9, row10} DSR2(11, max):{row11}


negative side effects:
1.fragment of DiskRowSets
2. redo log split is complicated


于 2016年03月17日 11:42, Binglin Chang 写道:

How can this be "bootstrapped"?
At beginning, there is no DRS, only one MRS.
It's hard to do pre-split DRS, if you don't know distribution, and key
distribution may change along time.

On Thu, Mar 17, 2016 at 11:36 AM, 曾 杰南 <ze...@hotmail.com> wrote:




Hi all:
I learn Kudu's paper "Kudu: Storage for Fast Analytics on Fast Data" very
hard to find
why performance of hbase' random query is superior to kudu. "the primary
key intervals
 of different RowSets may intersect" may be one of the reasons.

My confusion is why not keep DiskRowSets ordered on primary key globally.
When flush MemRowSet,
the rows of MemRowSet dispatch to deltaMemStore of correspanding
DiskRowSets. And negative side
effects is fragment of DiskRowSets, but it is worth for globally orderd of
DiskRowSets.

best
jie

Re: confusion about " the primary key intervals of different RowSets may intersect"

Posted by Binglin Chang <de...@gmail.com>.

Not an expert here, but my guess auto split & merge DRS is not available
right now and probably will be very complex. so compaction approach seems
more simple and reasonable?

Re: confusion about " the primary key intervals of different RowSets may intersect"

Posted by 曾杰南 <ze...@hotmail.com>.

We need't pre-split DRS.
for example:
    roll DRSsize after it's size > 3 (row)
    row5 means row with primary key 5

1. init
     MRS: {}
     DRS0(min, max): {}
2. insert three rows
     MRS: {row5, row7, row8}
     DRS0(min, max): {}

2. flush
     MRS: {}
     DRS0(min, max): {row5, row6, row8}

3. insert
    MRS: {row1, row9, row10, row11}
    DRS0(min, max): {row5, row6, row8}

4. flush  && split
    MRS: {}
    DRS0(min, max): {row1,row5, row6, row8, row9, row10, row11}  ->
        DRS0(min, 6]:{row1, row5, row6}  DSR1(6, 10]:{row8, row9, row10} DSR2(11, max):{row11}


negative side effects:
1.fragment of DiskRowSets
2. redo log split is complicated


于 2016年03月17日 11:42, Binglin Chang 写道:

How can this be "bootstrapped"?
At beginning, there is no DRS, only one MRS.
It's hard to do pre-split DRS, if you don't know distribution, and key
distribution may change along time.

On Thu, Mar 17, 2016 at 11:36 AM, 曾 杰南 <ze...@hotmail.com> wrote:



Hi all:
I learn Kudu's paper "Kudu: Storage for Fast Analytics on Fast Data" very
hard to find
why performance of hbase' random query is superior to kudu. "the primary
key intervals
 of different RowSets may intersect" may be one of the reasons.

My confusion is why not keep DiskRowSets ordered on primary key globally.
When flush MemRowSet,
the rows of MemRowSet dispatch to deltaMemStore of correspanding
DiskRowSets. And negative side
effects is fragment of DiskRowSets, but it is worth for globally orderd of
DiskRowSets.

best
jie