You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Balaji Rao (Jira)" <ji...@apache.org> on 2023/01/14 12:28:00 UTC

[jira] [Created] (KAFKA-14624) State restoration is broken with standby tasks and cache-enabled stores in processor API

Balaji Rao created KAFKA-14624:
----------------------------------

             Summary: State restoration is broken with standby tasks and cache-enabled stores in processor API
                 Key: KAFKA-14624
                 URL: https://issues.apache.org/jira/browse/KAFKA-14624
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 3.3.1
            Reporter: Balaji Rao


I found that cache-enabled state stores in PAPI with standby tasks sometimes returns stale data when a partition moves from one app instance to another and back. [Here's|https://github.com/balajirrao/kafka-streams-multi-runner] a small project that I used to reproduce the issue.

I dug around a bit and it seems like it's a bug in standby task state restoration when caching is enabled. If a partition moves from instance 1 to 2 and then back to instance 1,  since the `CachingKeyValueStore` doesn't register a restore callback, it can return potentially stale data for non-dirty keys. 

I could fix the issue by modifying the `CachingKeyValueStore` to register a restore callback in which the cache restored keys are added to the cache. Is this fix in the right direction?
{code:java}
        // register the store
        context.register(
                root,
                (RecordBatchingStateRestoreCallback) records -> {
                    for (final ConsumerRecord<byte[], byte[]> record : records) {
                        put(Bytes.wrap(record.key()), record.value());
                    }
                }
        );
{code}
 
I would like to contribute a fix, if I can get some help!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)