You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Sam Lendle <sl...@pandora.com> on 2018/07/11 18:42:09 UTC

Kafka Streams processor node metrics process rate with multiple stream threads

Hello!

Using kafka-streams 1.1.0, I noticed when I sum the process rate metric for a given processor node, the rate is many times higher than the number of incoming messages. Digging further, it looks like the rate metric associated with each thread in a given application instance is always the same, and if I average by instance and then sum the rates, I recover the incoming message rate.  So it looks like the rate metric for each stream thread is actually the reporting the rate for all threads on the instance.

Is this a known issue, or am I misusing the metric? I’m not sure if this affects other metrics, but it does look like the average latency metric is identical for all threads on the same instance, so I suspect it does.

Thanks,
Sam

Re: Kafka Streams processor node metrics process rate with multiple stream threads

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks Sam! Please feel free to assign the ticket to yourself and I will
review your PR if you created one:

https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes#ContributingCodeChanges-PullRequest

On Tue, Jul 17, 2018 at 6:29 PM, Sam Lendle <sl...@pandora.com> wrote:

> https://issues.apache.org/jira/browse/KAFKA-7176
>
> If I have a change I will give trunk a try.
>
> On 7/16/18, 2:14 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
>     Hmm.. this seems new to me. Checked on the source code it seems right
> to me.
>
>     Could you try out the latest trunk (build from source code) and see if
> it
>     is the same issue for you?
>
>     > In addition to that, though, I also see state store metrics for tasks
>     that have been migrated to another instance, and their values continue
> to
>     be updated, even after seeing messages in the logs indicating that
> local
>     state for those tasks has been cleaned. Is this also fixed, or a
> separate
>     issue?
>
>     This may be an issue that is not yet resolved, I'd need to double
> check. At
>     the mean time, could you create a JIRA for it?
>
>
>     Guozhang
>
>
>     On Thu, Jul 12, 2018 at 4:04 PM, Sam Lendle <sl...@pandora.com>
> wrote:
>
>     > Ah great, thanks Gouzhang.
>     >
>     > I also noticed a similar issue with state store metrics, where rate
>     > metrics for each thread/task appear to be the total rate across all
>     > threads/tasks on that instance.
>     >
>     > In addition to that, though, I also see state store metrics for
> tasks that
>     > have been migrated to another instance, and their values continue to
> be
>     > updated, even after seeing messages in the logs indicating that
> local state
>     > for those tasks has been cleaned. Is this also fixed, or a separate
> issue?
>     >
>     > Best,
>     > Sam
>     >
>     > On 7/11/18, 10:51 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>     >
>     >     Hello Sam,
>     >
>     >     It is a known issue that should have been fixed in 2.0, the
> correlated
>     > fix
>     >     has also been cherry-picked to the 1.1.1 bug fix release as well:
>     >
>     >     https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
>     > com_apache_kafka_pull_5277&d=DwIFaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo0
>     > 7QsHw-JRepxyw&r=BNCekDhngyXB6C2Ag7PIfHotiuqjAVwLOZLQHB7fyOM&m=-
>     > PxNeRIE8RN79eewJpZdqKjdn7hBegA5u-pJ208prdA&s=gJdWWHIgT-
>     > uqkFvjwFCQNXvC4C6fvar7pHqXXcHg2KE&e=
>     >
>     >
>     >     Guozhang
>     >
>     >     On Wed, Jul 11, 2018 at 11:42 AM, Sam Lendle <
> slendle@pandora.com>
>     > wrote:
>     >
>     >     > Hello!
>     >     >
>     >     > Using kafka-streams 1.1.0, I noticed when I sum the process
> rate
>     > metric
>     >     > for a given processor node, the rate is many times higher than
> the
>     > number
>     >     > of incoming messages. Digging further, it looks like the rate
> metric
>     >     > associated with each thread in a given application instance is
>     > always the
>     >     > same, and if I average by instance and then sum the rates, I
> recover
>     > the
>     >     > incoming message rate.  So it looks like the rate metric for
> each
>     > stream
>     >     > thread is actually the reporting the rate for all threads on
> the
>     > instance.
>     >     >
>     >     > Is this a known issue, or am I misusing the metric? I’m not
> sure if
>     > this
>     >     > affects other metrics, but it does look like the average
> latency
>     > metric is
>     >     > identical for all threads on the same instance, so I suspect
> it does.
>     >     >
>     >     > Thanks,
>     >     > Sam
>     >     >
>     >
>     >
>     >
>     >     --
>     >     -- Guozhang
>     >
>     >
>     >
>
>
>     --
>     -- Guozhang
>
>
>


-- 
-- Guozhang

Re: Kafka Streams processor node metrics process rate with multiple stream threads

Posted by Sam Lendle <sl...@pandora.com>.
https://issues.apache.org/jira/browse/KAFKA-7176

If I have a change I will give trunk a try.

On 7/16/18, 2:14 PM, "Guozhang Wang" <wa...@gmail.com> wrote:

    Hmm.. this seems new to me. Checked on the source code it seems right to me.
    
    Could you try out the latest trunk (build from source code) and see if it
    is the same issue for you?
    
    > In addition to that, though, I also see state store metrics for tasks
    that have been migrated to another instance, and their values continue to
    be updated, even after seeing messages in the logs indicating that local
    state for those tasks has been cleaned. Is this also fixed, or a separate
    issue?
    
    This may be an issue that is not yet resolved, I'd need to double check. At
    the mean time, could you create a JIRA for it?
    
    
    Guozhang
    
    
    On Thu, Jul 12, 2018 at 4:04 PM, Sam Lendle <sl...@pandora.com> wrote:
    
    > Ah great, thanks Gouzhang.
    >
    > I also noticed a similar issue with state store metrics, where rate
    > metrics for each thread/task appear to be the total rate across all
    > threads/tasks on that instance.
    >
    > In addition to that, though, I also see state store metrics for tasks that
    > have been migrated to another instance, and their values continue to be
    > updated, even after seeing messages in the logs indicating that local state
    > for those tasks has been cleaned. Is this also fixed, or a separate issue?
    >
    > Best,
    > Sam
    >
    > On 7/11/18, 10:51 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
    >
    >     Hello Sam,
    >
    >     It is a known issue that should have been fixed in 2.0, the correlated
    > fix
    >     has also been cherry-picked to the 1.1.1 bug fix release as well:
    >
    >     https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
    > com_apache_kafka_pull_5277&d=DwIFaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo0
    > 7QsHw-JRepxyw&r=BNCekDhngyXB6C2Ag7PIfHotiuqjAVwLOZLQHB7fyOM&m=-
    > PxNeRIE8RN79eewJpZdqKjdn7hBegA5u-pJ208prdA&s=gJdWWHIgT-
    > uqkFvjwFCQNXvC4C6fvar7pHqXXcHg2KE&e=
    >
    >
    >     Guozhang
    >
    >     On Wed, Jul 11, 2018 at 11:42 AM, Sam Lendle <sl...@pandora.com>
    > wrote:
    >
    >     > Hello!
    >     >
    >     > Using kafka-streams 1.1.0, I noticed when I sum the process rate
    > metric
    >     > for a given processor node, the rate is many times higher than the
    > number
    >     > of incoming messages. Digging further, it looks like the rate metric
    >     > associated with each thread in a given application instance is
    > always the
    >     > same, and if I average by instance and then sum the rates, I recover
    > the
    >     > incoming message rate.  So it looks like the rate metric for each
    > stream
    >     > thread is actually the reporting the rate for all threads on the
    > instance.
    >     >
    >     > Is this a known issue, or am I misusing the metric? I’m not sure if
    > this
    >     > affects other metrics, but it does look like the average latency
    > metric is
    >     > identical for all threads on the same instance, so I suspect it does.
    >     >
    >     > Thanks,
    >     > Sam
    >     >
    >
    >
    >
    >     --
    >     -- Guozhang
    >
    >
    >
    
    
    -- 
    -- Guozhang
    


Re: Kafka Streams processor node metrics process rate with multiple stream threads

Posted by Guozhang Wang <wa...@gmail.com>.
Hmm.. this seems new to me. Checked on the source code it seems right to me.

Could you try out the latest trunk (build from source code) and see if it
is the same issue for you?

> In addition to that, though, I also see state store metrics for tasks
that have been migrated to another instance, and their values continue to
be updated, even after seeing messages in the logs indicating that local
state for those tasks has been cleaned. Is this also fixed, or a separate
issue?

This may be an issue that is not yet resolved, I'd need to double check. At
the mean time, could you create a JIRA for it?


Guozhang


On Thu, Jul 12, 2018 at 4:04 PM, Sam Lendle <sl...@pandora.com> wrote:

> Ah great, thanks Gouzhang.
>
> I also noticed a similar issue with state store metrics, where rate
> metrics for each thread/task appear to be the total rate across all
> threads/tasks on that instance.
>
> In addition to that, though, I also see state store metrics for tasks that
> have been migrated to another instance, and their values continue to be
> updated, even after seeing messages in the logs indicating that local state
> for those tasks has been cleaned. Is this also fixed, or a separate issue?
>
> Best,
> Sam
>
> On 7/11/18, 10:51 PM, "Guozhang Wang" <wa...@gmail.com> wrote:
>
>     Hello Sam,
>
>     It is a known issue that should have been fixed in 2.0, the correlated
> fix
>     has also been cherry-picked to the 1.1.1 bug fix release as well:
>
>     https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> com_apache_kafka_pull_5277&d=DwIFaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo0
> 7QsHw-JRepxyw&r=BNCekDhngyXB6C2Ag7PIfHotiuqjAVwLOZLQHB7fyOM&m=-
> PxNeRIE8RN79eewJpZdqKjdn7hBegA5u-pJ208prdA&s=gJdWWHIgT-
> uqkFvjwFCQNXvC4C6fvar7pHqXXcHg2KE&e=
>
>
>     Guozhang
>
>     On Wed, Jul 11, 2018 at 11:42 AM, Sam Lendle <sl...@pandora.com>
> wrote:
>
>     > Hello!
>     >
>     > Using kafka-streams 1.1.0, I noticed when I sum the process rate
> metric
>     > for a given processor node, the rate is many times higher than the
> number
>     > of incoming messages. Digging further, it looks like the rate metric
>     > associated with each thread in a given application instance is
> always the
>     > same, and if I average by instance and then sum the rates, I recover
> the
>     > incoming message rate.  So it looks like the rate metric for each
> stream
>     > thread is actually the reporting the rate for all threads on the
> instance.
>     >
>     > Is this a known issue, or am I misusing the metric? I’m not sure if
> this
>     > affects other metrics, but it does look like the average latency
> metric is
>     > identical for all threads on the same instance, so I suspect it does.
>     >
>     > Thanks,
>     > Sam
>     >
>
>
>
>     --
>     -- Guozhang
>
>
>


-- 
-- Guozhang

Re: Kafka Streams processor node metrics process rate with multiple stream threads

Posted by Sam Lendle <sl...@pandora.com>.
Ah great, thanks Gouzhang.

I also noticed a similar issue with state store metrics, where rate metrics for each thread/task appear to be the total rate across all threads/tasks on that instance.

In addition to that, though, I also see state store metrics for tasks that have been migrated to another instance, and their values continue to be updated, even after seeing messages in the logs indicating that local state for those tasks has been cleaned. Is this also fixed, or a separate issue?

Best,
Sam

On 7/11/18, 10:51 PM, "Guozhang Wang" <wa...@gmail.com> wrote:

    Hello Sam,
    
    It is a known issue that should have been fixed in 2.0, the correlated fix
    has also been cherry-picked to the 1.1.1 bug fix release as well:
    
    https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_kafka_pull_5277&d=DwIFaQ&c=gFTBenQ7Vj71sUi1A4CkFnmPzqwDo07QsHw-JRepxyw&r=BNCekDhngyXB6C2Ag7PIfHotiuqjAVwLOZLQHB7fyOM&m=-PxNeRIE8RN79eewJpZdqKjdn7hBegA5u-pJ208prdA&s=gJdWWHIgT-uqkFvjwFCQNXvC4C6fvar7pHqXXcHg2KE&e=
    
    
    Guozhang
    
    On Wed, Jul 11, 2018 at 11:42 AM, Sam Lendle <sl...@pandora.com> wrote:
    
    > Hello!
    >
    > Using kafka-streams 1.1.0, I noticed when I sum the process rate metric
    > for a given processor node, the rate is many times higher than the number
    > of incoming messages. Digging further, it looks like the rate metric
    > associated with each thread in a given application instance is always the
    > same, and if I average by instance and then sum the rates, I recover the
    > incoming message rate.  So it looks like the rate metric for each stream
    > thread is actually the reporting the rate for all threads on the instance.
    >
    > Is this a known issue, or am I misusing the metric? I’m not sure if this
    > affects other metrics, but it does look like the average latency metric is
    > identical for all threads on the same instance, so I suspect it does.
    >
    > Thanks,
    > Sam
    >
    
    
    
    -- 
    -- Guozhang
    


Re: Kafka Streams processor node metrics process rate with multiple stream threads

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Sam,

It is a known issue that should have been fixed in 2.0, the correlated fix
has also been cherry-picked to the 1.1.1 bug fix release as well:

https://github.com/apache/kafka/pull/5277


Guozhang

On Wed, Jul 11, 2018 at 11:42 AM, Sam Lendle <sl...@pandora.com> wrote:

> Hello!
>
> Using kafka-streams 1.1.0, I noticed when I sum the process rate metric
> for a given processor node, the rate is many times higher than the number
> of incoming messages. Digging further, it looks like the rate metric
> associated with each thread in a given application instance is always the
> same, and if I average by instance and then sum the rates, I recover the
> incoming message rate.  So it looks like the rate metric for each stream
> thread is actually the reporting the rate for all threads on the instance.
>
> Is this a known issue, or am I misusing the metric? I’m not sure if this
> affects other metrics, but it does look like the average latency metric is
> identical for all threads on the same instance, so I suspect it does.
>
> Thanks,
> Sam
>



-- 
-- Guozhang