You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@samza.apache.org by "Cameron Lee (JIRA)" <ji...@apache.org> on 2018/04/09 23:30:00 UTC

[jira] [Created] (SAMZA-1642) Retention time for Kafka changelog topic does not get updated when TTL for high-level API join gets changed

Cameron Lee created SAMZA-1642:
----------------------------------

             Summary: Retention time for Kafka changelog topic does not get updated when TTL for high-level API join gets changed
                 Key: SAMZA-1642
                 URL: https://issues.apache.org/jira/browse/SAMZA-1642
             Project: Samza
          Issue Type: Bug
          Components: kafka
            Reporter: Cameron Lee


*Context:*

When using the high-level API to join streams, Samza automatically sets up a couple of RocksDB stores in order to keep track of each side of the join. The retention time of the RocksDB stores is set to the join TTL. These RocksDB stores are backed up by Kafka changelogs. Samza will automatically create these changelogs in Kafka, and the retention time of the changelogs is set to the join TTL as well.

*Issue:*

If the Samza job is initially deployed with a certain join TTL, then the Kafka changelogs will be created with the retention time set to that initial join TTL. If the Samza job is then redeployed with a different join TTL, then the retention time for the Kafka changelog will not get updated to the new value. However, the RocksDB TTL will get updated. This means that there will be an inconsistency between the RocksDB TTL and the Kafka changelog retention time. This will cause an issue when the Kafka changelog is needed to bootstrap a container, because the Kafka changelog will not properly reflect the data that existed in the corresponding RocksDB store on the previous container.

*Potential resources for solution:*

Kafka has an AdminUtils class ([https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/AdminUtils.scala)]) to fetch and change topic configurations (although these seem to currently be deprecated and replaced by AdminClient. Kafka 0.11 has an AdminClient ([https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html|https://kafka.apache.org/0110/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html)]) which allows for describing and altering configs for topics. One potential solution is that on startup, the Samza job could check the retention time config of the changelog topic and update it if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)