You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ozone.apache.org by "timmycheng (程力)" <ti...@tencent.com> on 2020/01/13 08:24:57 UTC

[DISCUSS] - Merge Multi-Raft Support - HDDS-1564

Hey all,

Happy to present the multi-raft feature to ozone community (https://issues.apache.org/jira/browse/HDDS-1564). This feature is to allow every datanode to host more than 1 pipeline based on user config to better utilize every datanode’s disks IO.

All dev work have been done and I’ve conducted performance tests in different scenarios. Based on my testing, multi-raft ozone cluster can help to make writing latency as low as 1/3 of single-raft’s one. Please check the attachment in the above JIRA for test brief and more details as well as the code patch.

I would like to use this thread to discuss about this feature and it’s merge back to master.

-Li

Re: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564

Posted by Anu Engineer <ae...@cloudera.com.INVALID>.
+1 from me too

—Anu

> On Feb 7, 2020, at 1:47 PM, Siddharth Wagle <sw...@cloudera.com.invalid> wrote:
> 
> Agree with Xiaoyu, +1 for the merge.
> 
> Thanks, Li Cheng for working on this feature and taking it to completion.
> 
> Best,
> Sid
> 
>> On Fri, Feb 7, 2020 at 1:38 PM Xiaoyu Yao <xy...@cloudera.com.invalid> wrote:
>> 
>> Thanks for sharing the data. Given the issues raised earlier have been
>> addressed in the follow up JIRA. I'm +1 for merge.
>> 
>> Xiaoyu
>> 
>> On Fri, Feb 7, 2020 at 8:34 AM timmycheng(程力) <ti...@tencent.com>
>> wrote:
>> 
>>> Hey all,
>>> 
>>> Just wanna follow up on multi-raft feature progress.  I’ve collect some
>>> feedbacks from Xiaoyu, Anu and Sid (
>>> 
>> https://docs.google.com/document/d/1NxCiHhn0u9BqgjuUXB8zxGtny69Qek4yTFe1QqUHiqM/edit
>> )
>>> and address them all in HDDS-2913. Shout out to Xiaoyu, Anu and Sid for
>> the
>>> feedbacks and help on resolving them as well. Also would like to know if
>>> there are other comments and reviews.
>>> 
>>> We at Tencent has already deployed the multi-raft version to our internal
>>> production cluster and it’s serving reasonable amount of traffic now. So
>>> far there are over 16K times of write into our Ozone cluster and I
>> compare
>>> with the single-raft version’s performance. Both are measured in similar
>>> pattern of traffic on daily basis.
>>> 
>>> Write finishes in:
>>> 
>>> Single raft
>>> 
>>> Multi raft
>>> 
>>>> 3s
>>> 
>>> 0.009%
>>> 
>>> 0.006%
>>> 
>>> 2s ~ 3s
>>> 
>>> 27.4%
>>> 
>>> 1.46%
>>> 
>>> 1s ~ 2s
>>> 
>>> 1.64%
>>> 
>>> 0.07%
>>> 
>>> 0.2s ~ 1s
>>> 
>>> 2.7%
>>> 
>>> 0.53%
>>> 
>>> < 0.2s
>>> 
>>> 68.2%
>>> 
>>> 97.9%
>>> 
>>> 
>>> Our internal customer writes to ozone every day and there are schedules
>>> jobs as well as on-demand jobs. Size could be from KB to GB every write,
>>> but every daes y’s traffic share the same pattern. Therefore, we see that
>>> multi-raft version makes ~98% of write finish within 0.2s, which is 20%
>>> more than what single-raft version can do. At the same time, those who
>>> finishes from 2s to 3s reduces from 27.4% to 1.46%. Multi-raft has made
>> our
>>> internal cluster more stable and the latency fluctuates way less, which
>> is
>>> pretty helpful.
>>> 
>>> Cheers,
>>> Li
>>> 
>>> 发件人: "timmycheng(程力)" <ti...@tencent.com>
>>> 日期: 2020年1月13日 星期一 下午4:24
>>> 收件人: "ozone-dev@hadoop.apache.org" <oz...@hadoop.apache.org>
>>> 主题: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564
>>> 
>>> Hey all,
>>> 
>>> Happy to present the multi-raft feature to ozone community (
>>> https://issues.apache.org/jira/browse/HDDS-1564). This feature is to
>>> allow every datanode to host more than 1 pipeline based on user config to
>>> better utilize every datanode’s disks IO.
>>> 
>>> All dev work have been done and I’ve conducted performance tests in
>>> different scenarios. Based on my testing, multi-raft ozone cluster can
>> help
>>> to make writing latency as low as 1/3 of single-raft’s one. Please check
>>> the attachment in the above JIRA for test brief and more details as well
>> as
>>> the code patch.
>>> 
>>> I would like to use this thread to discuss about this feature and it’s
>>> merge back to master.
>>> 
>>> -Li
>>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: ozone-dev-help@hadoop.apache.org


Re: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564

Posted by Siddharth Wagle <sw...@cloudera.com.INVALID>.
Agree with Xiaoyu, +1 for the merge.

Thanks, Li Cheng for working on this feature and taking it to completion.

Best,
Sid

On Fri, Feb 7, 2020 at 1:38 PM Xiaoyu Yao <xy...@cloudera.com.invalid> wrote:

> Thanks for sharing the data. Given the issues raised earlier have been
> addressed in the follow up JIRA. I'm +1 for merge.
>
> Xiaoyu
>
> On Fri, Feb 7, 2020 at 8:34 AM timmycheng(程力) <ti...@tencent.com>
> wrote:
>
> > Hey all,
> >
> > Just wanna follow up on multi-raft feature progress.  I’ve collect some
> > feedbacks from Xiaoyu, Anu and Sid (
> >
> https://docs.google.com/document/d/1NxCiHhn0u9BqgjuUXB8zxGtny69Qek4yTFe1QqUHiqM/edit
> )
> > and address them all in HDDS-2913. Shout out to Xiaoyu, Anu and Sid for
> the
> > feedbacks and help on resolving them as well. Also would like to know if
> > there are other comments and reviews.
> >
> > We at Tencent has already deployed the multi-raft version to our internal
> > production cluster and it’s serving reasonable amount of traffic now. So
> > far there are over 16K times of write into our Ozone cluster and I
> compare
> > with the single-raft version’s performance. Both are measured in similar
> > pattern of traffic on daily basis.
> >
> > Write finishes in:
> >
> > Single raft
> >
> > Multi raft
> >
> > > 3s
> >
> > 0.009%
> >
> > 0.006%
> >
> > 2s ~ 3s
> >
> > 27.4%
> >
> > 1.46%
> >
> > 1s ~ 2s
> >
> > 1.64%
> >
> > 0.07%
> >
> > 0.2s ~ 1s
> >
> > 2.7%
> >
> > 0.53%
> >
> > < 0.2s
> >
> > 68.2%
> >
> > 97.9%
> >
> >
> > Our internal customer writes to ozone every day and there are schedules
> > jobs as well as on-demand jobs. Size could be from KB to GB every write,
> > but every daes y’s traffic share the same pattern. Therefore, we see that
> > multi-raft version makes ~98% of write finish within 0.2s, which is 20%
> > more than what single-raft version can do. At the same time, those who
> > finishes from 2s to 3s reduces from 27.4% to 1.46%. Multi-raft has made
> our
> > internal cluster more stable and the latency fluctuates way less, which
> is
> > pretty helpful.
> >
> > Cheers,
> > Li
> >
> > 发件人: "timmycheng(程力)" <ti...@tencent.com>
> > 日期: 2020年1月13日 星期一 下午4:24
> > 收件人: "ozone-dev@hadoop.apache.org" <oz...@hadoop.apache.org>
> > 主题: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564
> >
> > Hey all,
> >
> > Happy to present the multi-raft feature to ozone community (
> > https://issues.apache.org/jira/browse/HDDS-1564). This feature is to
> > allow every datanode to host more than 1 pipeline based on user config to
> > better utilize every datanode’s disks IO.
> >
> > All dev work have been done and I’ve conducted performance tests in
> > different scenarios. Based on my testing, multi-raft ozone cluster can
> help
> > to make writing latency as low as 1/3 of single-raft’s one. Please check
> > the attachment in the above JIRA for test brief and more details as well
> as
> > the code patch.
> >
> > I would like to use this thread to discuss about this feature and it’s
> > merge back to master.
> >
> > -Li
> >
>

Re: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564

Posted by Xiaoyu Yao <xy...@cloudera.com.INVALID>.
Thanks for sharing the data. Given the issues raised earlier have been
addressed in the follow up JIRA. I'm +1 for merge.

Xiaoyu

On Fri, Feb 7, 2020 at 8:34 AM timmycheng(程力) <ti...@tencent.com>
wrote:

> Hey all,
>
> Just wanna follow up on multi-raft feature progress.  I’ve collect some
> feedbacks from Xiaoyu, Anu and Sid (
> https://docs.google.com/document/d/1NxCiHhn0u9BqgjuUXB8zxGtny69Qek4yTFe1QqUHiqM/edit)
> and address them all in HDDS-2913. Shout out to Xiaoyu, Anu and Sid for the
> feedbacks and help on resolving them as well. Also would like to know if
> there are other comments and reviews.
>
> We at Tencent has already deployed the multi-raft version to our internal
> production cluster and it’s serving reasonable amount of traffic now. So
> far there are over 16K times of write into our Ozone cluster and I compare
> with the single-raft version’s performance. Both are measured in similar
> pattern of traffic on daily basis.
>
> Write finishes in:
>
> Single raft
>
> Multi raft
>
> > 3s
>
> 0.009%
>
> 0.006%
>
> 2s ~ 3s
>
> 27.4%
>
> 1.46%
>
> 1s ~ 2s
>
> 1.64%
>
> 0.07%
>
> 0.2s ~ 1s
>
> 2.7%
>
> 0.53%
>
> < 0.2s
>
> 68.2%
>
> 97.9%
>
>
> Our internal customer writes to ozone every day and there are schedules
> jobs as well as on-demand jobs. Size could be from KB to GB every write,
> but every daes y’s traffic share the same pattern. Therefore, we see that
> multi-raft version makes ~98% of write finish within 0.2s, which is 20%
> more than what single-raft version can do. At the same time, those who
> finishes from 2s to 3s reduces from 27.4% to 1.46%. Multi-raft has made our
> internal cluster more stable and the latency fluctuates way less, which is
> pretty helpful.
>
> Cheers,
> Li
>
> 发件人: "timmycheng(程力)" <ti...@tencent.com>
> 日期: 2020年1月13日 星期一 下午4:24
> 收件人: "ozone-dev@hadoop.apache.org" <oz...@hadoop.apache.org>
> 主题: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564
>
> Hey all,
>
> Happy to present the multi-raft feature to ozone community (
> https://issues.apache.org/jira/browse/HDDS-1564). This feature is to
> allow every datanode to host more than 1 pipeline based on user config to
> better utilize every datanode’s disks IO.
>
> All dev work have been done and I’ve conducted performance tests in
> different scenarios. Based on my testing, multi-raft ozone cluster can help
> to make writing latency as low as 1/3 of single-raft’s one. Please check
> the attachment in the above JIRA for test brief and more details as well as
> the code patch.
>
> I would like to use this thread to discuss about this feature and it’s
> merge back to master.
>
> -Li
>

Re: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564

Posted by "timmycheng (程力)" <ti...@tencent.com>.
Hey all,

Just wanna follow up on multi-raft feature progress.  I’ve collect some feedbacks from Xiaoyu, Anu and Sid (https://docs.google.com/document/d/1NxCiHhn0u9BqgjuUXB8zxGtny69Qek4yTFe1QqUHiqM/edit) and address them all in HDDS-2913. Shout out to Xiaoyu, Anu and Sid for the feedbacks and help on resolving them as well. Also would like to know if there are other comments and reviews.

We at Tencent has already deployed the multi-raft version to our internal production cluster and it’s serving reasonable amount of traffic now. So far there are over 16K times of write into our Ozone cluster and I compare with the single-raft version’s performance. Both are measured in similar pattern of traffic on daily basis.

Write finishes in:

Single raft

Multi raft

> 3s

0.009%

0.006%

2s ~ 3s

27.4%

1.46%

1s ~ 2s

1.64%

0.07%

0.2s ~ 1s

2.7%

0.53%

< 0.2s

68.2%

97.9%


Our internal customer writes to ozone every day and there are schedules jobs as well as on-demand jobs. Size could be from KB to GB every write, but every daes y’s traffic share the same pattern. Therefore, we see that multi-raft version makes ~98% of write finish within 0.2s, which is 20% more than what single-raft version can do. At the same time, those who finishes from 2s to 3s reduces from 27.4% to 1.46%. Multi-raft has made our internal cluster more stable and the latency fluctuates way less, which is pretty helpful.

Cheers,
Li

发件人: "timmycheng(程力)" <ti...@tencent.com>
日期: 2020年1月13日 星期一 下午4:24
收件人: "ozone-dev@hadoop.apache.org" <oz...@hadoop.apache.org>
主题: [DISCUSS] - Merge Multi-Raft Support - HDDS-1564

Hey all,

Happy to present the multi-raft feature to ozone community (https://issues.apache.org/jira/browse/HDDS-1564). This feature is to allow every datanode to host more than 1 pipeline based on user config to better utilize every datanode’s disks IO.

All dev work have been done and I’ve conducted performance tests in different scenarios. Based on my testing, multi-raft ozone cluster can help to make writing latency as low as 1/3 of single-raft’s one. Please check the attachment in the above JIRA for test brief and more details as well as the code patch.

I would like to use this thread to discuss about this feature and it’s merge back to master.

-Li