You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2021/03/19 02:39:00 UTC

[jira] [Resolved] (KAFKA-12503) Resizing the thread cache in a non thread safe way can cause records to be redirected throughout the topology

     [ https://issues.apache.org/jira/browse/KAFKA-12503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

A. Sophie Blee-Goldman resolved KAFKA-12503.
--------------------------------------------
    Resolution: Fixed

> Resizing the thread cache in a non thread safe way can cause records to be redirected throughout the topology
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-12503
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12503
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.8.0
>            Reporter: Walker Carlson
>            Assignee: Walker Carlson
>            Priority: Blocker
>             Fix For: 2.8.0
>
>
> When a thread is added, removed or replaced the cache is resized. When the thread cache was resized it was being done so from the thread initiating these calls. This can cause the record to be redirected to the wrong processor via the call to `evict` in the cache. The evict flushes records downstream to the next processor after the cache. But if this is on the wrong thread the wrong processor receives them. 
> This can cause 3 problems.
> 1) When the owner finishes processing the record it set the current node to null in the processor context a this then causes the other processor to throw an exception `StreamsException: Current node is unknown.`. 
> 2) Depending on the type it can cause a class cast exception as the record is a different type. Mostly this happened when the value types were different inside of the map node from the toStream method
> 3) A silent issue is it could cause data to be processed by the wrong node and cause data corruption. We have not been able to confirm this last one but it is the most dangerous in many ways.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)