You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@doris.apache.org by Chen Zhang <ch...@gmail.com> on 2022/06/23 06:43:54 UTC

[Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Hi devs,

Unique-Key data model is widely used in scenarios like Flink-CDC, user
profile(用户画像), E-commerce orders, but the query performance for current
Merge-On-Read implementation is not good, due to the following reasons:

   1. Doris can't determine whether one row in a segment file is latest or
   outdated, so it has to do some extra merge sort before getting the
   latest data, and key comparison is quite CPU-costive.
   2. Aggregate function predicate push down is not supported by the
   Unique-Key data model due to reason(1).

I'd like to propose to support a Merge-On-Write implementation for the
Unique-Key data model,  which leverages a new segment-file-level primary
key index (used for point lookup on write) and a delete bitmap (marks some
rowid as deleted), which can optimize read performance significantly.

At the beginning, we wanted to add another Primary-Key data model with
Merge-On-Write implementation, but after a lot of discussion, we'd prefer
to improve the Unique-Key data model rather than adding another one.

I'll add detailed design and related research in the DSIP doc later.

Re: Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by Chen Zhang <ch...@gmail.com>.
Updated the scheduling
https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model

Best
Chen Zhang
在 2022年6月27日 +0800 11:59,Chen Zhang <ch...@gmail.com>,写道:
> Hi Devs, I've update the DISP last weekend, if you are interest on this feature, welcome to review and comment, thanks
> https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model
>
> Best
> Chen Zhang
> 在 2022年6月24日 +0800 10:13,zhg yang <ya...@gmail.com>,写道:
> > @ Chen Zhang For the more important features, it is best to send a DISP
> > first to let everyone discuss the design
> > Thanks
> > Yang Zhengguo
> >
> >
> > Chen Zhang <ch...@gmail.com> 于2022年6月23日周四 22:30写道:
> >
> > > @Minghong We'll use a multi-version delete bitmap, only save delta for
> > > each version.
> > > For example, we have a rowset with version [0-98], transaction 99 updated
> > > some row in that rowset, and so does transaction 100 and 101, there would
> > > be 3 delete bitmaps on that rowset, corresponding to rows updated by
> > > version 99, 100 and 101. A query with version x will only see the bitmap up
> > > to version x. There's more details about space saving and cache
> > > acceleration, let's discuss it in DSIP.
> > >
> > > @Xiaoli, our team have finished most develop works for the basic function
> > > in our private repository, but there‘s still lots of works to do, welcome
> > > to get involve.
> > >
> > > @Mingyu, could you help to create a DISP doc? I don't seem to have
> > > permission.
> > >
> > > Best
> > > Chen Zhang
> > > On Jun 23, 2022, 21:41 +0800, Zhou Minghong <mi...@163.com>,
> > > wrote:
> > > > Hi Chen Zhang
> > > > one question about "and a delete bitmap (marks some rowid as deleted)”:
> > > > how to handle transaction information by a bitmap?
> > > > for example, transaction_100 delete a row, but this still visible to
> > > transaction_99, but not visible to trasanction_101. How to handle this case?
> > > >
> > > >
> > > > Br/Minghong
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
> > > > > Hi Chen Zhang,
> > > > >
> > > > > I am very interested in this topic, and want to participate in the
> > > development.
> > > > >
> > > > > 在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > Unique-Key data model is widely used in scenarios like Flink-CDC, user
> > > > > profile(用户画像), E-commerce orders, but the query performance for current
> > > > > Merge-On-Read implementation is not good, due to the following reasons:
> > > > >
> > > > > 1. Doris can't determine whether one row in a segment file is latest or
> > > > > outdated, so it has to do some extra merge sort before getting the
> > > > > latest data, and key comparison is quite CPU-costive.
> > > > > 2. Aggregate function predicate push down is not supported by the
> > > > > Unique-Key data model due to reason(1).
> > > > >
> > > > > I'd like to propose to support a Merge-On-Write implementation for the
> > > > > Unique-Key data model, which leverages a new segment-file-level primary
> > > > > key index (used for point lookup on write) and a delete bitmap (marks
> > > some
> > > > > rowid as deleted), which can optimize read performance significantly.
> > > > >
> > > > > At the beginning, we wanted to add another Primary-Key data model with
> > > > > Merge-On-Write implementation, but after a lot of discussion, we'd
> > > prefer
> > > > > to improve the Unique-Key data model rather than adding another one.
> > > > >
> > > > > I'll add detailed design and related research in the DSIP doc later.
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
> > > > > For additional commands, e-mail: dev-help@doris.apache.org
> > > > >
> > >

Re: Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by Chen Zhang <ch...@gmail.com>.
Hi Devs, I've update the DISP last weekend, if you are interest on this feature, welcome to review and comment, thanks
https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model

Best
Chen Zhang
在 2022年6月24日 +0800 10:13,zhg yang <ya...@gmail.com>,写道:
> @ Chen Zhang For the more important features, it is best to send a DISP
> first to let everyone discuss the design
> Thanks
> Yang Zhengguo
>
>
> Chen Zhang <ch...@gmail.com> 于2022年6月23日周四 22:30写道:
>
> > @Minghong We'll use a multi-version delete bitmap, only save delta for
> > each version.
> > For example, we have a rowset with version [0-98], transaction 99 updated
> > some row in that rowset, and so does transaction 100 and 101, there would
> > be 3 delete bitmaps on that rowset, corresponding to rows updated by
> > version 99, 100 and 101. A query with version x will only see the bitmap up
> > to version x. There's more details about space saving and cache
> > acceleration, let's discuss it in DSIP.
> >
> > @Xiaoli, our team have finished most develop works for the basic function
> > in our private repository, but there‘s still lots of works to do, welcome
> > to get involve.
> >
> > @Mingyu, could you help to create a DISP doc? I don't seem to have
> > permission.
> >
> > Best
> > Chen Zhang
> > On Jun 23, 2022, 21:41 +0800, Zhou Minghong <mi...@163.com>,
> > wrote:
> > > Hi Chen Zhang
> > > one question about "and a delete bitmap (marks some rowid as deleted)”:
> > > how to handle transaction information by a bitmap?
> > > for example, transaction_100 delete a row, but this still visible to
> > transaction_99, but not visible to trasanction_101. How to handle this case?
> > >
> > >
> > > Br/Minghong
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
> > > > Hi Chen Zhang,
> > > >
> > > > I am very interested in this topic, and want to participate in the
> > development.
> > > >
> > > > 在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
> > > >
> > > > Hi devs,
> > > >
> > > > Unique-Key data model is widely used in scenarios like Flink-CDC, user
> > > > profile(用户画像), E-commerce orders, but the query performance for current
> > > > Merge-On-Read implementation is not good, due to the following reasons:
> > > >
> > > > 1. Doris can't determine whether one row in a segment file is latest or
> > > > outdated, so it has to do some extra merge sort before getting the
> > > > latest data, and key comparison is quite CPU-costive.
> > > > 2. Aggregate function predicate push down is not supported by the
> > > > Unique-Key data model due to reason(1).
> > > >
> > > > I'd like to propose to support a Merge-On-Write implementation for the
> > > > Unique-Key data model, which leverages a new segment-file-level primary
> > > > key index (used for point lookup on write) and a delete bitmap (marks
> > some
> > > > rowid as deleted), which can optimize read performance significantly.
> > > >
> > > > At the beginning, we wanted to add another Primary-Key data model with
> > > > Merge-On-Write implementation, but after a lot of discussion, we'd
> > prefer
> > > > to improve the Unique-Key data model rather than adding another one.
> > > >
> > > > I'll add detailed design and related research in the DSIP doc later.
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
> > > > For additional commands, e-mail: dev-help@doris.apache.org
> > > >
> >

Re: Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by zhg yang <ya...@gmail.com>.
@ Chen Zhang  For the more important features, it is best to send a DISP
first to let everyone discuss the design
Thanks
Yang Zhengguo


Chen Zhang <ch...@gmail.com> 于2022年6月23日周四 22:30写道:

> @Minghong We'll use a multi-version delete bitmap, only save delta for
> each version.
> For example, we have a rowset with version [0-98], transaction 99 updated
> some row in that rowset, and so does transaction 100 and 101, there would
> be 3 delete bitmaps on that rowset, corresponding to rows updated by
> version 99, 100 and 101. A query with version x will only see the bitmap up
> to version x. There's more details about space saving and cache
> acceleration, let's discuss it in DSIP.
>
> @Xiaoli, our team have finished most develop works for the basic function
> in our private repository, but there‘s still lots of works to do, welcome
> to get involve.
>
> @Mingyu, could you help to create a DISP doc? I don't seem to have
> permission.
>
> Best
> Chen Zhang
> On Jun 23, 2022, 21:41 +0800, Zhou Minghong <mi...@163.com>,
> wrote:
> > Hi Chen Zhang
> > one question about "and a delete bitmap (marks some rowid as deleted)”:
> > how to handle transaction information by a bitmap?
> > for example, transaction_100 delete a row, but this still visible to
> transaction_99, but not visible to trasanction_101. How to handle this case?
> >
> >
> > Br/Minghong
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
> > > Hi Chen Zhang,
> > >
> > > I am very interested in this topic, and want to participate in the
> development.
> > >
> > > 在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
> > >
> > > Hi devs,
> > >
> > > Unique-Key data model is widely used in scenarios like Flink-CDC, user
> > > profile(用户画像), E-commerce orders, but the query performance for current
> > > Merge-On-Read implementation is not good, due to the following reasons:
> > >
> > > 1. Doris can't determine whether one row in a segment file is latest or
> > > outdated, so it has to do some extra merge sort before getting the
> > > latest data, and key comparison is quite CPU-costive.
> > > 2. Aggregate function predicate push down is not supported by the
> > > Unique-Key data model due to reason(1).
> > >
> > > I'd like to propose to support a Merge-On-Write implementation for the
> > > Unique-Key data model, which leverages a new segment-file-level primary
> > > key index (used for point lookup on write) and a delete bitmap (marks
> some
> > > rowid as deleted), which can optimize read performance significantly.
> > >
> > > At the beginning, we wanted to add another Primary-Key data model with
> > > Merge-On-Write implementation, but after a lot of discussion, we'd
> prefer
> > > to improve the Unique-Key data model rather than adding another one.
> > >
> > > I'll add detailed design and related research in the DSIP doc later.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
> > > For additional commands, e-mail: dev-help@doris.apache.org
> > >
>

Re:Re:Re:Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by 陈明雨 <mo...@163.com>.
Done!




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morningman@apache.org





在 2022-06-24 08:48:29,"Chen Zhang" <ch...@gmail.com> 写道:
>@Mingyu, my username: zhannngchen. Thanks~
>
>Best
>Chen Zhang
>On Jun 24, 2022, 00:56 +0800, 陈明雨 <mo...@163.com>, wrote:
>> Hi Zhang Chen:
>> I have created a DSIP-018 for this[1]. But you need to create an account and tell me your username.
>>
>>
>> [1] https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model
>>
>>
>>
>>
>> --
>>
>> 此致!Best Regards
>> 陈明雨 Mingyu Chen
>>
>> Email:
>> morningman@apache.org
>>
>>
>>
>>
>>
>> At 2022-06-23 22:29:49, "Chen Zhang" <ch...@gmail.com> wrote:
>> > @Minghong We'll use a multi-version delete bitmap, only save delta for each version.
>> > For example, we have a rowset with version [0-98], transaction 99 updated some row in that rowset, and so does transaction 100 and 101, there would be 3 delete bitmaps on that rowset, corresponding to rows updated by version 99, 100 and 101. A query with version x will only see the bitmap up to version x. There's more details about space saving and cache acceleration, let's discuss it in DSIP.
>> >
>> > @Xiaoli, our team have finished most develop works for the basic function in our private repository, but there‘s still lots of works to do, welcome to get involve.
>> >
>> > @Mingyu, could you help to create a DISP doc? I don't seem to have permission.
>> >
>> > Best
>> > Chen Zhang
>> > On Jun 23, 2022, 21:41 +0800, Zhou Minghong <mi...@163.com>, wrote:
>> > > Hi Chen Zhang
>> > > one question about "and a delete bitmap (marks some rowid as deleted)”:
>> > > how to handle transaction information by a bitmap?
>> > > for example, transaction_100 delete a row, but this still visible to transaction_99, but not visible to trasanction_101. How to handle this case?
>> > >
>> > >
>> > > Br/Minghong
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
>> > > > Hi Chen Zhang,
>> > > >
>> > > > I am very interested in this topic, and want to participate in the development.
>> > > >
>> > > > 在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
>> > > >
>> > > > Hi devs,
>> > > >
>> > > > Unique-Key data model is widely used in scenarios like Flink-CDC, user
>> > > > profile(用户画像), E-commerce orders, but the query performance for current
>> > > > Merge-On-Read implementation is not good, due to the following reasons:
>> > > >
>> > > > 1. Doris can't determine whether one row in a segment file is latest or
>> > > > outdated, so it has to do some extra merge sort before getting the
>> > > > latest data, and key comparison is quite CPU-costive.
>> > > > 2. Aggregate function predicate push down is not supported by the
>> > > > Unique-Key data model due to reason(1).
>> > > >
>> > > > I'd like to propose to support a Merge-On-Write implementation for the
>> > > > Unique-Key data model, which leverages a new segment-file-level primary
>> > > > key index (used for point lookup on write) and a delete bitmap (marks some
>> > > > rowid as deleted), which can optimize read performance significantly.
>> > > >
>> > > > At the beginning, we wanted to add another Primary-Key data model with
>> > > > Merge-On-Write implementation, but after a lot of discussion, we'd prefer
>> > > > to improve the Unique-Key data model rather than adding another one.
>> > > >
>> > > > I'll add detailed design and related research in the DSIP doc later.
>> > > >
>> > > >
>> > > > ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
>> > > > For additional commands, e-mail: dev-help@doris.apache.org
>> > > >

Re:Re:Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by Chen Zhang <ch...@gmail.com>.
@Mingyu, my username: zhannngchen. Thanks~

Best
Chen Zhang
On Jun 24, 2022, 00:56 +0800, 陈明雨 <mo...@163.com>, wrote:
> Hi Zhang Chen:
> I have created a DSIP-018 for this[1]. But you need to create an account and tell me your username.
>
>
> [1] https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model
>
>
>
>
> --
>
> 此致!Best Regards
> 陈明雨 Mingyu Chen
>
> Email:
> morningman@apache.org
>
>
>
>
>
> At 2022-06-23 22:29:49, "Chen Zhang" <ch...@gmail.com> wrote:
> > @Minghong We'll use a multi-version delete bitmap, only save delta for each version.
> > For example, we have a rowset with version [0-98], transaction 99 updated some row in that rowset, and so does transaction 100 and 101, there would be 3 delete bitmaps on that rowset, corresponding to rows updated by version 99, 100 and 101. A query with version x will only see the bitmap up to version x. There's more details about space saving and cache acceleration, let's discuss it in DSIP.
> >
> > @Xiaoli, our team have finished most develop works for the basic function in our private repository, but there‘s still lots of works to do, welcome to get involve.
> >
> > @Mingyu, could you help to create a DISP doc? I don't seem to have permission.
> >
> > Best
> > Chen Zhang
> > On Jun 23, 2022, 21:41 +0800, Zhou Minghong <mi...@163.com>, wrote:
> > > Hi Chen Zhang
> > > one question about "and a delete bitmap (marks some rowid as deleted)”:
> > > how to handle transaction information by a bitmap?
> > > for example, transaction_100 delete a row, but this still visible to transaction_99, but not visible to trasanction_101. How to handle this case?
> > >
> > >
> > > Br/Minghong
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
> > > > Hi Chen Zhang,
> > > >
> > > > I am very interested in this topic, and want to participate in the development.
> > > >
> > > > 在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
> > > >
> > > > Hi devs,
> > > >
> > > > Unique-Key data model is widely used in scenarios like Flink-CDC, user
> > > > profile(用户画像), E-commerce orders, but the query performance for current
> > > > Merge-On-Read implementation is not good, due to the following reasons:
> > > >
> > > > 1. Doris can't determine whether one row in a segment file is latest or
> > > > outdated, so it has to do some extra merge sort before getting the
> > > > latest data, and key comparison is quite CPU-costive.
> > > > 2. Aggregate function predicate push down is not supported by the
> > > > Unique-Key data model due to reason(1).
> > > >
> > > > I'd like to propose to support a Merge-On-Write implementation for the
> > > > Unique-Key data model, which leverages a new segment-file-level primary
> > > > key index (used for point lookup on write) and a delete bitmap (marks some
> > > > rowid as deleted), which can optimize read performance significantly.
> > > >
> > > > At the beginning, we wanted to add another Primary-Key data model with
> > > > Merge-On-Write implementation, but after a lot of discussion, we'd prefer
> > > > to improve the Unique-Key data model rather than adding another one.
> > > >
> > > > I'll add detailed design and related research in the DSIP doc later.
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
> > > > For additional commands, e-mail: dev-help@doris.apache.org
> > > >

Re:Re:Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by 陈明雨 <mo...@163.com>.
Hi Zhang Chen:
I have created a DSIP-018 for this[1]. But you need to create an account and tell me your username.


[1] https://cwiki.apache.org/confluence/display/DORIS/DSIP-018%3A+Support+Merge-On-Write+implementation+for+UNIQUE+KEY+data+model




--

此致!Best Regards
陈明雨 Mingyu Chen

Email:
morningman@apache.org





At 2022-06-23 22:29:49, "Chen Zhang" <ch...@gmail.com> wrote:
>@Minghong We'll use a multi-version delete bitmap, only save delta for each version.
>For example, we have a rowset with version [0-98], transaction 99 updated some row in that rowset, and so does transaction 100 and 101, there would be 3 delete bitmaps on that rowset, corresponding to rows updated by version 99, 100 and 101. A query with version x will only see the bitmap up to version x. There's more details about space saving and cache acceleration, let's discuss it in DSIP.
>
>@Xiaoli, our team have finished most develop works for the basic function in our private repository, but there‘s still lots of works to do, welcome to get involve.
>
>@Mingyu, could you help to create a DISP doc? I don't seem to have permission.
>
>Best
>Chen Zhang
>On Jun 23, 2022, 21:41 +0800, Zhou Minghong <mi...@163.com>, wrote:
>> Hi Chen Zhang
>> one question about "and a delete bitmap (marks some rowid as deleted)”:
>> how to handle transaction information by a bitmap?
>> for example, transaction_100 delete a row, but this still visible to transaction_99, but not visible to trasanction_101. How to handle this case?
>>
>>
>> Br/Minghong
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
>> > Hi Chen Zhang,
>> >
>> > I am very interested in this topic, and want to participate in the development.
>> >
>> > 在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
>> >
>> > Hi devs,
>> >
>> > Unique-Key data model is widely used in scenarios like Flink-CDC, user
>> > profile(用户画像), E-commerce orders, but the query performance for current
>> > Merge-On-Read implementation is not good, due to the following reasons:
>> >
>> > 1. Doris can't determine whether one row in a segment file is latest or
>> > outdated, so it has to do some extra merge sort before getting the
>> > latest data, and key comparison is quite CPU-costive.
>> > 2. Aggregate function predicate push down is not supported by the
>> > Unique-Key data model due to reason(1).
>> >
>> > I'd like to propose to support a Merge-On-Write implementation for the
>> > Unique-Key data model, which leverages a new segment-file-level primary
>> > key index (used for point lookup on write) and a delete bitmap (marks some
>> > rowid as deleted), which can optimize read performance significantly.
>> >
>> > At the beginning, we wanted to add another Primary-Key data model with
>> > Merge-On-Write implementation, but after a lot of discussion, we'd prefer
>> > to improve the Unique-Key data model rather than adding another one.
>> >
>> > I'll add detailed design and related research in the DSIP doc later.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
>> > For additional commands, e-mail: dev-help@doris.apache.org
>> >

Re:Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by Chen Zhang <ch...@gmail.com>.
@Minghong We'll use a multi-version delete bitmap, only save delta for each version.
For example, we have a rowset with version [0-98], transaction 99 updated some row in that rowset, and so does transaction 100 and 101, there would be 3 delete bitmaps on that rowset, corresponding to rows updated by version 99, 100 and 101. A query with version x will only see the bitmap up to version x. There's more details about space saving and cache acceleration, let's discuss it in DSIP.

@Xiaoli, our team have finished most develop works for the basic function in our private repository, but there‘s still lots of works to do, welcome to get involve.

@Mingyu, could you help to create a DISP doc? I don't seem to have permission.

Best
Chen Zhang
On Jun 23, 2022, 21:41 +0800, Zhou Minghong <mi...@163.com>, wrote:
> Hi Chen Zhang
> one question about "and a delete bitmap (marks some rowid as deleted)”:
> how to handle transaction information by a bitmap?
> for example, transaction_100 delete a row, but this still visible to transaction_99, but not visible to trasanction_101. How to handle this case?
>
>
> Br/Minghong
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
> > Hi Chen Zhang,
> >
> > I am very interested in this topic, and want to participate in the development.
> >
> > 在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
> >
> > Hi devs,
> >
> > Unique-Key data model is widely used in scenarios like Flink-CDC, user
> > profile(用户画像), E-commerce orders, but the query performance for current
> > Merge-On-Read implementation is not good, due to the following reasons:
> >
> > 1. Doris can't determine whether one row in a segment file is latest or
> > outdated, so it has to do some extra merge sort before getting the
> > latest data, and key comparison is quite CPU-costive.
> > 2. Aggregate function predicate push down is not supported by the
> > Unique-Key data model due to reason(1).
> >
> > I'd like to propose to support a Merge-On-Write implementation for the
> > Unique-Key data model, which leverages a new segment-file-level primary
> > key index (used for point lookup on write) and a delete bitmap (marks some
> > rowid as deleted), which can optimize read performance significantly.
> >
> > At the beginning, we wanted to add another Primary-Key data model with
> > Merge-On-Write implementation, but after a lot of discussion, we'd prefer
> > to improve the Unique-Key data model rather than adding another one.
> >
> > I'll add detailed design and related research in the DSIP doc later.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
> > For additional commands, e-mail: dev-help@doris.apache.org
> >

Re:Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by Zhou Minghong <mi...@163.com>.
Hi Chen Zhang 
one question about "and a delete bitmap (marks some rowid as deleted)”:
how to handle transaction information by a bitmap?
for example, transaction_100 delete a row, but this still visible to transaction_99, but not visible to trasanction_101. How to handle this case?


Br/Minghong

















At 2022-06-23 19:14:58, "Zhu,Xiaoli" <zh...@baidu.com> wrote:
>Hi Chen Zhang,
>
>I am very interested in this topic, and want to participate in the development.
>
>在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:
>
>    Hi devs,
>
>    Unique-Key data model is widely used in scenarios like Flink-CDC, user
>    profile(用户画像), E-commerce orders, but the query performance for current
>    Merge-On-Read implementation is not good, due to the following reasons:
>
>       1. Doris can't determine whether one row in a segment file is latest or
>       outdated, so it has to do some extra merge sort before getting the
>       latest data, and key comparison is quite CPU-costive.
>       2. Aggregate function predicate push down is not supported by the
>       Unique-Key data model due to reason(1).
>
>    I'd like to propose to support a Merge-On-Write implementation for the
>    Unique-Key data model,  which leverages a new segment-file-level primary
>    key index (used for point lookup on write) and a delete bitmap (marks some
>    rowid as deleted), which can optimize read performance significantly.
>
>    At the beginning, we wanted to add another Primary-Key data model with
>    Merge-On-Write implementation, but after a lot of discussion, we'd prefer
>    to improve the Unique-Key data model rather than adding another one.
>
>    I'll add detailed design and related research in the DSIP doc later.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: dev-unsubscribe@doris.apache.org
>For additional commands, e-mail: dev-help@doris.apache.org
>

Re: [Discuss][DSIP] Support Merge-On-Write implementation for UNIQUE KEY data model

Posted by "Zhu,Xiaoli" <zh...@baidu.com>.
Hi Chen Zhang,

I am very interested in this topic, and want to participate in the development.

在 2022/6/23 下午2:44,“Chen Zhang”<ch...@gmail.com> 写入:

    Hi devs,

    Unique-Key data model is widely used in scenarios like Flink-CDC, user
    profile(用户画像), E-commerce orders, but the query performance for current
    Merge-On-Read implementation is not good, due to the following reasons:

       1. Doris can't determine whether one row in a segment file is latest or
       outdated, so it has to do some extra merge sort before getting the
       latest data, and key comparison is quite CPU-costive.
       2. Aggregate function predicate push down is not supported by the
       Unique-Key data model due to reason(1).

    I'd like to propose to support a Merge-On-Write implementation for the
    Unique-Key data model,  which leverages a new segment-file-level primary
    key index (used for point lookup on write) and a delete bitmap (marks some
    rowid as deleted), which can optimize read performance significantly.

    At the beginning, we wanted to add another Primary-Key data model with
    Merge-On-Write implementation, but after a lot of discussion, we'd prefer
    to improve the Unique-Key data model rather than adding another one.

    I'll add detailed design and related research in the DSIP doc later.