You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/10/09 22:49:20 UTC

[jira] [Commented] (KAFKA-4281) Should be able to forward aggregation values immediately

    [ https://issues.apache.org/jira/browse/KAFKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15560764#comment-15560764 ] 

ASF GitHub Bot commented on KAFKA-4281:
---------------------------------------

GitHub user gfodor opened a pull request:

    https://github.com/apache/kafka/pull/1998

    KAFKA-4281: Should be able to forward aggregation values immediately

    https://issues.apache.org/jira/browse/KAFKA-4281

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AltspaceVR/kafka KAFKA-4281

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/1998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1998
    
----
commit dfe004a24ff6491f286ac9fd405b6a1cae8ae2f5
Author: Greg Fodor <gf...@gmail.com>
Date:   2016-10-09T22:46:02Z

    Added forwardImmediately argument to various grouping APIs to allow users to specify that records should be immediately forwarded during aggregations, etc

----


> Should be able to forward aggregation values immediately
> --------------------------------------------------------
>
>                 Key: KAFKA-4281
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4281
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.0
>            Reporter: Greg Fodor
>            Assignee: Guozhang Wang
>
> KIP-63 introduced changes to the behavior of aggregations such that the result of aggregations will not appear to subsequent processors until a state store flush occurs. This is problematic for latency sensitive aggregations since flushes occur generally at commit.interval.ms, which is usually a few seconds. Combined with several aggregations, this can result in several seconds of latency through a topology for steps dependent upon aggregations.
> Two potential solutions:
> - Allow finer control over the state store flushing intervals
> - Allow users to change the behavior so that certain aggregations will immediately forward records to the next step (as was the case pre-KIP-63)
> A PR is attached that takes the second approach. To add this unfortunately a large number of files needed to be touched, and this effectively doubles the number of method signatures around grouping on KTable and KStream. I tried an alternative approach that let the user opt-in to immediate forwarding via an additional builder method on KGroupedStream/Table but this didn't work as expected because in order for the latency to go away, the KTableImpl itself must also mark its source as forward immediate (otherwise we will still see latency due to the materialization of the KTableSource still relying upon state store flushes to propagate.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)