You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Michal Borowiecki (JIRA)" <ji...@apache.org> on 2017/05/23 09:05:04 UTC

[jira] [Commented] (KAFKA-5243) Request to add row limit in ReadOnlyKeyValueStore range function

    [ https://issues.apache.org/jira/browse/KAFKA-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020911#comment-16020911 ] 

Michal Borowiecki commented on KAFKA-5243:
------------------------------------------

Just a note, replacing the second argument is not an option IMO, as it would clash with the current range(K from, K to) method. K is a type parameter that itself could be an int, making the two indistinguishable, I think.

Secondly, the existing range() and all() methods expressly do not guarantee ordering of the returned iterator. I think the new range(from, to, limit) method would only make sense if order in the returned iterator is consistent across invocations. This is probably not a problem for the built-in stores, but given these stores are meant to be pluggable, perhaps it would be better to not force other stores implementations to take on those guarantees? Instead a new interface with stronger guarantees could be added e.g. ReadOnlyOrderedKeyValueStore extending ReadOnlyKeyValueStore and adding this extra method. It could also add the consistent ordering promise on the inherited range(from, to) and all() methods. Just a thought.

Probably best to raise a KIP and discuss on the mailing list. Since this is a public API change a KIP is required anyway:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals 

> Request to add row limit in ReadOnlyKeyValueStore range function
> ----------------------------------------------------------------
>
>                 Key: KAFKA-5243
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5243
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 0.10.1.1
>            Reporter: Joe Wood
>
> When using distributed queries across a cluster of stream stores it's quite common to use query pagination to limit the number of rows returned. The {{range}} function on {{ReadOnlyKeyValueStore}} only accepts the {{to}} and {{from}} keys. This means that the query created either unncessarily retrieves the entire range and manually limits the rows, or estimates the range based on the key values. Neither options are ideal for processing distributed queries.
> This suggestion is to add an overload to the {{range}} function by adding a third (or replacement second) argument as a suggested row limit count. This means that the range of keys returned will not exceed the supplied count.
> {code:java}
> // Get an iterator over a given range of keys, limiting to limit elements.
> KeyValueIterator<K,V>	range(K from, K to, int limit)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)