You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "A. Sophie Blee-Goldman (Jira)" <ji...@apache.org> on 2022/12/10 03:35:00 UTC

[jira] [Created] (KAFKA-14460) In-memory store iterators can return results with null values

A. Sophie Blee-Goldman created KAFKA-14460:
----------------------------------------------

             Summary: In-memory store iterators can return results with null values
                 Key: KAFKA-14460
                 URL: https://issues.apache.org/jira/browse/KAFKA-14460
             Project: Kafka
          Issue Type: Bug
          Components: streams
            Reporter: A. Sophie Blee-Goldman


Due to the thread-safety model we adopted in our in-memory stores to avoid scaling issues, we synchronize all read/write methods and then during range scans, copy the keyset of all results rather than returning a direct iterator over the underlying map. When users call #next to read out the iterator results, we issue a point lookup on the next key and then simply return a new KeyValue<>(key, get(key))

This lets the range scan return results without blocking access to the store by other threads and without risk of ConcurrentModification, as a writer can modify the real store without affecting the keyset copy of the iterator. This also means that those changes won't be reflected in what the iterator sees or returns, which in itself is fine as we don't guarantee consistency semantics of any kind.

However, we _do_ guarantee that range scans "must not return null values" – and this contract may be violated if the StreamThread deletes a record that the iterator was going to return.

tl;dr we should check get(key) for null and skip to the next result if necessary in the in-memory store iterators. See for example InMemoryKeyValueIterator (note that we'll probably need to buffer one record in advance before we return true from #hasNext)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)