You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Andrew (JIRA)" <ji...@apache.org> on 2019/05/03 13:04:00 UTC

[jira] [Comment Edited] (KAFKA-8315) Cannot pass Materialized into a join operation - hence cant set retention period independent of grace

    [ https://issues.apache.org/jira/browse/KAFKA-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832325#comment-16832325 ] 

Andrew edited comment on KAFKA-8315 at 5/3/19 1:03 PM:
-------------------------------------------------------

The fudge factor is the \{windowstore.changelog.additional.retention.ms} which is set to 24h by default, so that seems to add up to 5 days (120 hours).

You understand my use case well. The choice of limiting the grace was to reduce the performance overhead during the large historical processing.

I definitely have lots of data beyond this that should join and the joins and aggregation within the latest 120 hours seem to be correct and complete. This seems to be just related to retention. I will experiment with increasing \{windowstore.changelog.additional.retention.ms} to see if it brings in more data.

One oddity of our left stream is that it contains records in batches from different devices. Each batch is about 1000 records and contiguous within the stream. Within a batch the records are in increasing timestamp order. Subsequent batches from different devices will be within 1-2 days of the previous batch (we don't have, say, a batch for 1/1/2019 followed by a batch for 4/1/2019 or a batch for 24/12/2019 for example. The right stream is single records not batches with similar date ordering (i.e. subsequent records should be within 1-2 days of each other.

My understanding is that the windows are closed when streamtime - windowstart > windowsize + grace. So as stream time increases as newer batches arrive, joins/aggregations should continue, until a batch arrives after windowsize + grace. However, we are not seeing this.

 

[Correction]

 

I think I meant : 

streamtime - windowend > grace


was (Author: the4thamigo_uk):
The fudge factor is the \{windowstore.changelog.additional.retention.ms} which is set to 24h by default, so that seems to add up to 5 days (120 hours).

You understand my use case well. The choice of limiting the grace was to reduce the performance overhead during the large historical processing.

I definitely have lots of data beyond this that should join and the joins and aggregation within the latest 120 hours seem to be correct and complete. This seems to be just related to retention. I will experiment with increasing \{windowstore.changelog.additional.retention.ms} to see if it brings in more data. 

One oddity of our left stream is that it contains records in batches from different devices. Each batch is about 1000 records and contiguous within the stream. Within a batch the records are in increasing timestamp order. Subsequent batches from different devices will be within 1-2 days of the previous batch (we don't have, say, a batch for 1/1/2019 followed by a batch for 4/1/2019 or a batch for 24/12/2019 for example. The right stream is single records not batches with similar date ordering (i.e. subsequent records should be within 1-2 days of each other.

My understanding is that the windows are closed when streamtime - windowstart > windowsize + grace. So as stream time increases as newer batches arrive, joins/aggregations should continue, until a batch arrives after windowsize + grace. However, we are not seeing this.

> Cannot pass Materialized into a join operation - hence cant set retention period independent of grace
> -----------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-8315
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8315
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Andrew
>            Assignee: John Roesler
>            Priority: Major
>
> The documentation says to use `Materialized` not `JoinWindows.until()` ([https://kafka.apache.org/22/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#until-long-]), but there is no where to pass a `Materialized` instance to the join operation, only to the group operation is supported it seems.
>  
> Slack conversation here : [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1556799561287300]
> [Additional]
> From what I understand, the retention period should be independent of the grace period, so I think this is more than a documentation fix (see comments below)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)