You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2016/11/11 22:09:58 UTC

[jira] [Commented] (KAFKA-4120) byte[] keys in RocksDB state stores do not work as expected

    [ https://issues.apache.org/jira/browse/KAFKA-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658318#comment-15658318 ] 

Guozhang Wang commented on KAFKA-4120:
--------------------------------------

Marking it as resolved after 0.10.1.0. [~elevy] [~gfodor] please feel free to re-open if your use case still have similar issues.

> byte[] keys in RocksDB state stores do not work as expected
> -----------------------------------------------------------
>
>                 Key: KAFKA-4120
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4120
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.0.1
>            Reporter: Greg Fodor
>            Assignee: Guozhang Wang
>
> We ran into an issue using a byte[] key in a RocksDB state store (with the byte array serde.) Internally, the RocksDB store keeps a LRUCache that is backed by a LinkedHashMap that sits between the callers and the actual db. The problem is that while the underlying rocks db will persist byte arrays with equal data as equivalent keys, the LinkedHashMap uses byte[] reference equality from Object.equals/hashcode. So, this can result in multiple entries in the cache for two different byte arrays that have the same contents and are backed by the same key in the db, resulting in unexpected behavior. 
> One such behavior that manifests from this is if you store a value in the state store with a specific key, if you re-read that key with the same byte array you will get the new value, but if you re-read that key with a different byte array with the same bytes, you will get a stale value until the db is flushed. (This made it particularly tricky to track down what was happening :))
> The workaround for us is to convert the keys from raw byte arrays to a deserialized avro structure that provides proper hashcode/equals semantics for the intermediate cache. In general this seems like good practice, so one of the proposed solutions is to simply emit a warning or exception if a key type with breaking semantics like this is provided.
> A few proposed solutions:
> - When the state store is defined on array keys, ensure that the cache map does proper comparisons on array values not array references. This would fix this problem, but seems a bit strange to special case. However, I have a hard time of thinking of other examples where this behavior would burn users.
> - Change the LRU cache to deserialize and serialize all keys to bytes and use a value based comparison for the map. This would be the most correct, as it would ensure that both the rocks db and the cache have identical key spaces and equality/hashing semantics. However, this is probably slow, and since the general case of using avro record types as keys works fine, it will largely be unnecessary overhead.
> - Don't change anything about the behavior, but trigger a warning in the log or fail to start if a state store is defined on array keys (or possibly any key type that fails to properly override Object.equals/hashcode.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)