You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by Chen Liang <va...@gmail.com> on 2020/11/04 20:07:52 UTC

Re: Cost Based FairCallQueue latency issue

Hi Fengnan,

We had been testing cost based faire call queue internally. We also saw
latency increase, and we are trying to debug into this issue as well.
Current suspicion is that the way that the metrics were generated might be
introducing too much overhead. We are in the process of trying to reproduce
this using Dynamometer. If this is something you would be interested in, we
can follow up on working together on this issue.

Best,
Chen

Fengnan Li <lo...@gmail.com> 于2020年10月30日周五 下午1:51写道：

> Hi all,
>
>
>
> Has someone deployed Cost Based Fair Call Queue in their production
> cluster? We ran into some RPC queue latency degradation with ~30k-40k rps.
> I tried to debug but didn’t find anything suspicious. It is worth
> mentioning there is no memory issue coming with the extra heap usage for
> storing the call cost.
>
>
>
> Thanks,
>
> Fengnan
>
>

Re: Cost Based FairCallQueue latency issue

Posted by Fengnan Li <lo...@gmail.com>.

Thanks for replying, Chen! There are a lot of contexts about how this so it is probably better so set up some meeting about it. Do you have time this week? I am interested to know in what circumstances you ran into the queue latency issue.

 

Some more context from my side:

I did below debugging to figure out.
Double checked rpc processing time, but didn’t find obvious increase.
Did some flamegraph profiling, but didn’t catch obvious jstack on cost based related area.
Replayed with Dynamometer but was not able to find clear increase in that environment. 
 

Thanks,

Fengnan

 

From: Chen Liang <va...@gmail.com>
Date: Wednesday, November 4, 2020 at 12:08 PM
To: Fengnan Li <lo...@gmail.com>
Cc: Hdfs-dev <hd...@hadoop.apache.org>
Subject: Re: Cost Based FairCallQueue latency issue

 

Hi Fengnan,

 

We had been testing cost based faire call queue internally. We also saw latency increase, and we are trying to debug into this issue as well. Current suspicion is that the way that the metrics were generated might be introducing too much overhead. We are in the process of trying to reproduce this using Dynamometer. If this is something you would be interested in, we can follow up on working together on this issue.

 

Best,

Chen

 

Fengnan Li <lo...@gmail.com> 于2020年10月30日周五 下午1:51写道：

Hi all,



Has someone deployed Cost Based Fair Call Queue in their production cluster? We ran into some RPC queue latency degradation with ~30k-40k rps. I tried to debug but didn’t find anything suspicious. It is worth mentioning there is no memory issue coming with the extra heap usage for storing the call cost.



Thanks,

Fengnan