You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Peter Davis (JIRA)" <ji...@apache.org> on 2017/12/15 03:09:00 UTC
[jira] [Comment Edited] (KAFKA-5285) optimize upper / lower byte range for key range scan on windowed stores

    [ https://issues.apache.org/jira/browse/KAFKA-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291962#comment-16291962 ] 

Peter Davis edited comment on KAFKA-5285 at 12/15/17 3:08 AM:
--------------------------------------------------------------

In debugging a recent performance blocker in an app of mine, I'm suspecting that when calling {{ReadOnlySessionStore.fetch(from,to)}}, which uses from (Timestamp) = Long.MAX_VALUE, it calls {{upperRange}} with a maxSuffix filled with mostly 0xFF's.  The resulting upperRange is therefore also mostly 0xFF and the resulting RocksDB iterator effectively iterates over (binaryKeyFrom...infinity)!  With a large number of keys, this is much worse than a mere performance issue (though the result appears "correct" since SessionKeySchema.hasNextCondition filters out the bogus results).  It iterates over thousands of unnecessary records and is slow as molasses.

It looks like the issue dates to KIP-155.

In [{{SessionKeySchema#upperRange}}|https://github.com/apache/kafka/commit/e28752357705568219315375c666f8e500db9c12#diff-52e7d2701ecab21b32621d9b13b7f33bR57], why is {{putLong(to)}} (timestamp) repeated twice and it does not use the actual {{key}} to build the {{maxRange}}?

When using a timestamp less than {{Long.MAX_VALUE}}, the issue is avoided because the first timestamp put into {{maxRange}} begins mostly with 0 bytes, so {{OrderedBytes.upperRange}} copies more of the real key in its mask loop.  But {{ReadOnlySessionStore.fetch}} does not let one specify a different timestamp.


was (Author: davispw):
In debugging a recent performance blocker in an app of mine, I'm suspecting that when calling `ReadOnlySessionStore.fetch(from,to)`, which uses from (Timestamp) = Long.MAX_VALUE, it calls `upperRange` with a `maxSuffix` filled with mostly 0xFF's.  The resulting upperRange is therefore also mostly 0xFF and the resulting RocksDB iterator effectively iterates over (binaryKeyFrom...infinity).  With a large number of keys, this is much worse than a mere performance issue (though the result appears "correct" since SessionKeySchema.hasNextCondition filters out the bogus results).  It iterates over thousands of unnecessary records and is slow as molasses.

It looks like the issue dates to KIP-155.

In [`SessionKeySchema#upperRange`](https://github.com/apache/kafka/commit/e28752357705568219315375c666f8e500db9c12#diff-52e7d2701ecab21b32621d9b13b7f33bR57), why is `putLong(to)` (timestamp) repeated twice and it does not use `put(key)` to build the `maxRange`?

When using a timestamp less than `Long.MAX_VALUE`, the issue is avoided because `OrderedBytes.upperRange` copies more of the real key.  But `ReadOnlySessionStore.fetch` does not let one specify a different timestamp.

> optimize upper / lower byte range for key range scan on windowed stores
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-5285
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5285
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Xavier Léauté
>            Assignee: Xavier Léauté
>              Labels: performance
>
> The current implementation of {{WindowKeySchema}} / {{SessionKeySchema}} {{upperRange}} and {{lowerRange}} does not make any assumptions with respect to the other key bound (e.g. the upper byte bound does not depends on lower key bound).
> It should be possible to optimize the byte range somewhat further using the information provided by the lower bound.
> More specifically, by incorporating that information, we should be able to eliminate the corresponding {{upperRangeFixedSize}} and {{lowerRangeFixedSize}}, since the result should be the same if we implement that optimization.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)