You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Jorge Esteban Quilcate Otoya <qu...@gmail.com> on 2020/07/01 22:01:40 UTC

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Quick update: KIP is updated with latest changes now.
Will leave it open this week while working on the PR.

Hope to open a new vote thread over the next few days if no additional
feedback is provided.

Cheers,
Jorge.

On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
quilcate.jorge@gmail.com> wrote:

> Thanks, John!
>
> Make sense to reconsider the current approach. I was heading in a similar
> direction while drafting the implementation. Metered, Caching, and other
> layers will also have to get duplicated to build up new methods in `Stores`
> factory, and class casting issues would appear on stores created by DSL.
>
> I will draft a proposal with new methods (move methods from proposed
> interfaces to existing ones) with default implementation in a KIP update
> and wait for Matthias to chime in to validate this approach.
>
> Jorge.
>
>
> On Sat, Jun 27, 2020 at 4:01 PM John Roesler <vv...@apache.org> wrote:
>
>> Hi Jorge,
>>
>> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1 releases.
>>
>> The idea to separate the new methods into "mixin" interfaces seems
>> like a good one, but as we've discovered in KIP-614, it doesn't work
>> out that way in practice. The problem is that the store implementations
>> are just the base layer that get composed with other layers in Streams
>> before they can be accessed in the DSL. This is extremely subtle, so
>> I'm going to put everyone to sleep with a detailed explanation:
>>
>> For example, this is the mechanism by which all KeyValueStore
>> implementations get added to Streams:
>> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
>> return new MeteredKeyValueStore<>(
>>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
>>   storeSupplier.metricsScope(),
>>   time,
>>   keySerde,
>>   valueSerde
>> );
>>
>> In the DSL, the store that a processor gets from the context would be
>> the result of this composition. So even if the storeSupplier.get() returns
>> a store that implements the "reverse" interface, when you try to use it
>> from a processor like:
>> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
>> ReadOnlyBackwardWindowStore<K, V> store =
>>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
>>
>> You'd just get a ClassCastException because it's actually a
>> MeteredKeyValueStore, which doesn't implement
>> ReadOnlyBackwardWindowStore.
>>
>> The only way to make this work would be to make the Metered,
>> Caching, and Logging layers also implement the new interfaces,
>> but this effectively forces implementations to also implement
>> the interface. Otherwise, the intermediate layers would have to
>> cast the store in each method, like this:
>> MeteredWindowStore#backwardFetch {
>>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
>> }
>>
>> And then if the implementation doesn't "opt in" by implementing
>> the interface, you'd get a ClassCastException, not when you get the
>> store, but when you try to use it.
>>
>> The fact that we get ClassCastExceptions no matter which way we
>> turn here indicates that we're really not getting any benefit from the
>> type system, which makes the extra interfaces seem not worth all the
>> code involved.
>>
>> Where we landed in KIP-614 is that, unless we want to completely
>> revamp the way that StateStores work in the DSL, you might as
>> well just add the new methods to the existing interfaces. To prevent
>> compilation errors, we can add default implementations that throw
>> UnsupportedOperationException. If a store doesn't opt in by
>> implementing the methods, you'd get an UnsupportedOperationException,
>> which seems no worse, and maybe better, than the ClassCastException
>> you'd get if we go with the "mixin interface" approach.
>>
>> A quick note: This entire discussion focuses on the DSL. If you're just
>> using the Processor API by directly adding the a custom store to the
>> Topology:
>> org.apache.kafka.streams.Topology#addStateStore
>> and then retrieving it in the processor via:
>> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
>> in
>> org.apache.kafka.streams.processor.Processor#init
>>
>> Then, you can both register and retrieve _any_ StateStore implementation.
>> There's no need to use KeyValueStore or any other built-in interface.
>> In other words, KeyValueStore and company are only part of the DSL,
>> not the PAPI. So, discussions about the build-in store interfaces are only
>> really relevant in the context of the DSL, Transformers, and Materialized.
>>
>> So, in conclusion, I'd really recommend just adding any new methods to
>> the existing store interfaces. We might be able to revamp the API in the
>> future to support mixins, but it's a much larger scope of work than this
>> KIP.
>> A more minor comment is that we don't need to add Deprecated variants
>> of new methods.
>>
>> Thanks again, and once again, I'm sorry I tuned out and didn't offer this
>> feedback before you revised the KIP.
>> -John
>>
>>
>>
>>
>> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
>> > Hi everyone,
>> >
>> > I've updated the KIP, applying Matthias' feedback regarding interface
>> > hierarchy.
>> >
>> > Also, following the last email, I think we can consider reverse
>> operations
>> > on KeyValue range as well, as implementation supports lexicographic
>> order.
>> >
>> > I considered different naming between Key-based ranges and Time-based
>> > ranges, and mitigate confusion when fetching keys and time ranges as
>> > WindowStore does:
>> >
>> > Key-based ranges: reverseRange(), reverseAll()
>> > Time-based ranges: backwardFetch()
>> >
>> > Then, key-based changes apply to KeyValueStore, and time-based changes
>> to
>> > Window and Session stores.
>> >
>> > Let me know if you have any questions.
>> >
>> > Thanks,
>> > Jorge.
>> >
>> >
>> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
>> > quilcate.jorge@gmail.com> wrote:
>> >
>> > > Hi everyone, sorry for the late reply.
>> > >
>> > > Thanks Matthias for your feedback. I think it makes sense to
>> reconsider
>> > > the current design based on your input.
>> > >
>> > > After digging deeper into the current implementation, I'd like to
>> bring my
>> > > current understanding to be double-checked as it might be redefining
>> the
>> > > KIP's scope:
>> > >
>> > > 1. There are 2 ranges been exposed by different stores:
>> > >
>> > > a. Key Range
>> > > b. Timestamp Range
>> > >
>> > > So far, we have discussed covering both.
>> > >
>> > > 2. Key Range functions do not provide ordering guarantees by design:
>> > >
>> > > ```ReadOnlyKeyValueStore.java
>> > >     /**
>> > >      * Get an iterator over a given range of keys. This iterator must
>> be
>> > > closed after use.
>> > >      * The returned iterator must be safe from {@link
>> > > java.util.ConcurrentModificationException}s
>> > >      * and must not return null values. No ordering guarantees are
>> > > provided.
>> > >      * ...
>> > >      */
>> > >      KeyValueIterator<K, V> range(K from, K to);
>> > > ```
>> > >
>> > > Therefore, I'd propose removing Key range operations from the scope.
>> > >
>> > > 3. Timestamp Range operations happen at the SegmentsStore level
>> (internal)
>> > > API
>> > >
>> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
>> > >
>> > > I'd propose extending `Segments#segments(timeFrom, timeTo, backwards)`
>> > > with a flag for backwards operations.
>> > >
>> > > As segments returned will be processed backwards, I'm not extending
>> > > KeyValueStores to query each segment backwards as previous point 2.
>> > >
>> > > 4. Extend WindowStores implementations with a new
>> > > WindowBackwardStore/ReadOnlyBackwardStore:
>> > >
>> > > ```java
>> > > public interface ReadOnlyWindowBackwardStore<K, V> {
>> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from, Instant
>> to)
>> > > throws IllegalArgumentException;
>> > >
>> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
>> Instant
>> > > fromTime, Instant toTime)
>> > >         throws IllegalArgumentException;
>> > >
>> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant from,
>> > > Instant to) throws IllegalArgumentException;
>> > > ```
>> > >
>> > > 5. SessionStore is a bit different as it has fetch/find sessions
>> spread
>> > > between SessionStore and ReadOnlySessionStore.
>> > >
>> > > I'd propose a new interface `SessionBackwardStore` to expose backward
>> find
>> > > operations:
>> > >
>> > > ```java
>> > > public interface SessionBackwardStore<K, AGG> {
>> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
>> key,
>> > > final long earliestSessionEndTime, final long latestSessionStartTime);
>> > >
>> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
>> > > keyFrom, final K keyTo, final long earliestSessionEndTime, final long
>> > > latestSessionStartTime);
>> > > }
>> > > ```
>> > >
>> > > If this understanding is correct I'll proceed to update the KIP based
>> on
>> > > this.
>> > >
>> > > Looking forward to your feedback.
>> > >
>> > > Thanks,
>> > > Jorge.
>> > >
>> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <mj...@apache.org>
>> wrote:
>> > >
>> > >> Hey,
>> > >>
>> > >> Sorry that I am late to the game. I am not 100% convinced about the
>> > >> current proposal. Using a new config as feature flag seems to be
>> rather
>> > >> "nasty" to me, and flipping from/to is a little bit too fancy for my
>> > >> personal taste.
>> > >>
>> > >> I agree, that the original proposal using a "ReadDirection" enum is
>> not
>> > >> ideal either.
>> > >>
>> > >> Thus, I would like to put out a new idea: We could add a new
>> interface
>> > >> that offers new methods that return revers iterators.
>> > >>
>> > >> The KIP already proposes to add `reverseAll()` and it seems backward
>> > >> incompatible to just add this method to `ReadOnlyKeyValueStore` and
>> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
>> default
>> > >> implementation for custom stores and thus either break compatibility
>> or
>> > >> need add a default that just throws an exception. Neither seems to
>> be a
>> > >> good option.
>> > >>
>> > >> Using a new interface avoid this issue and allows users implementing
>> > >> custom stores to opt-in by adding the interface to their stores.
>> > >> Furthermore, we don't need any config. In the end, we encapsulte the
>> > >> change into the store, and our runtime is agnostic to it (as it
>> should
>> > >> be).
>> > >>
>> > >> The hierarchy becomes a little complex (but uses would not really see
>> > >> the complexity):
>> > >>
>> > >> // exsiting
>> > >> ReadOnlyKeyValueStore
>> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
>> > >>
>> > >>
>> > >> // helper interface; users don't care
>> > >> // need similar ones for other stores
>> > >> ReverseReadOnlyKeyValueStore {
>> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
>> > >>     KeyValueIterator<K, V> reverseAll();
>> > >> }
>> > >>
>> > >>
>> > >> // two new user facing interfaces for kv-store
>> > >> // need similar ones for other stores
>> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
>> ReadOnlyKeyValueStore,
>> > >> ReverseReadOnlyKeyValueStore
>> > >>
>> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
>> > >> ReverseReadOnlyKeyValueStore
>> > >>
>> > >>
>> > >> // updated (also internal)
>> > >> // also need to update other built-in stores
>> > >> RocksDB implements KeyValueStoreWithReverseIterators,
>> BulkLoadingStore
>> > >>
>> > >>
>> > >> In the end, user would only care about the two (for kv-case) new
>> > >> interface that offer revers iterator (read only and regular) and can
>> > >> cast stores accordingly in their Processors/Transformers or via IQ.
>> > >>
>> > >>
>> > >> Btw: if we add revers iterator for KeyValue and Window store, should
>> we
>> > >> do the same for Session store?
>> > >>
>> > >>
>> > >>
>> > >> This might be more code to write, but I believe it provides the
>> better
>> > >> user experience. Thoughts?
>> > >>
>> > >>
>> > >>
>> > >> -Matthias
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On 5/26/20 8:47 PM, John Roesler wrote:
>> > >> > Sorry for my silence, Jorge,
>> > >> >
>> > >> > I've just taken another look, and I'm personally happy with the
>> KIP.
>> > >> >
>> > >> > Thanks,
>> > >> > -John
>> > >> >
>> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote:
>> > >> >> If no additional comments, I will proceed to start the a vote
>> thread.
>> > >> >>
>> > >> >> Thanks a lot for your feedback!
>> > >> >>
>> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya <
>> > >> >> quilcate.jorge@gmail.com> wrote:
>> > >> >>
>> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
>> > >> >>>
>> > >> >>> I updated the KIP with your feedback.
>> > >> >>>
>> > >> >>>
>> > >> >>>
>> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
>> > >> sophie@confluent.io>
>> > >> >>> wrote:
>> > >> >>>
>> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in the
>> case of
>> > >> >>>> variable-length keys.
>> > >> >>>>
>> > >> >>>> In the case of fixed-length keys, you can just compute the keys
>> that
>> > >> >>>> correspond
>> > >> >>>> to the maximum and minimum serialized bytes, then perform a
>> `range()`
>> > >> >>>> query
>> > >> >>>> instead of an `all()`. If your keys don't have a well-defined
>> > >> ordering
>> > >> >>>> such
>> > >> >>>> that
>> > >> >>>> you can't determine the MAX_KEY, then you probably don't care
>> about
>> > >> the
>> > >> >>>> iterator order anyway.
>> > >> >>>>
>> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If all your
>> > >> keys were
>> > >> >>>> just
>> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only
>> way
>> > >> to
>> > >> >>>> figure out the
>> > >> >>>> maximum key in the store is by using `all()` -- and without a
>> reverse
>> > >> >>>> iterator, you're
>> > >> >>>> doomed to iterate through every single key just to answer that
>> simple
>> > >> >>>> question.
>> > >> >>>>
>> > >> >>>> That said, I still think determining the iterator order based
>> on the
>> > >> >>>> to/from bytes
>> > >> >>>> makes a lot of intuitive sense and gives the API a nice
>> symmetry.
>> > >> What if
>> > >> >>>> we
>> > >> >>>> solved the `all()` problem by just giving `all()` a reverse
>> form to
>> > >> >>>> complement it?
>> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something to
>> that
>> > >> effect.
>> > >> >>>>
>> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya <
>> > >> >>>> quilcate.jorge@gmail.com> wrote:
>> > >> >>>>
>> > >> >>>>> Thanks John.
>> > >> >>>>>
>> > >> >>>>> Agree. I like the first approach as well, with StreamsConfig
>> flag
>> > >> >>>> passing
>> > >> >>>>> by via ProcessorContext.
>> > >> >>>>>
>> > >> >>>>> Another positive effect with "reverse parameters" is that in
>> the
>> > >> case of
>> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide
>> _which_
>> > >> pair
>> > >> >>>> to
>> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
>> > >> >>>>>
>> > >> >>>>> The only issue I've found while reviewing the KIP is that
>> `all()`
>> > >> won't
>> > >> >>>> fit
>> > >> >>>>> within this approach.
>> > >> >>>>>
>> > >> >>>>> We could remove it from the KIP and argue that for WindowStore,
>> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse
>> > >> order,
>> > >> >>>> and
>> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
>> > >> >>>>>
>> > >> >>>>> If there is consensus with this changes, I will go and update
>> the
>> > >> KIP.
>> > >> >>>>>
>> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
>> vvcephei@apache.org>
>> > >> >>>> wrote:
>> > >> >>>>>
>> > >> >>>>>> Hi Jorge,
>> > >> >>>>>>
>> > >> >>>>>> Thanks for that idea. I agree, a feature flag would protect
>> anyone
>> > >> >>>>>> who may be depending on the current behavior.
>> > >> >>>>>>
>> > >> >>>>>> It seems better to locate the feature flag in the
>> initialization
>> > >> >>>> logic of
>> > >> >>>>>> the store, rather than have a method on the "live" store that
>> > >> changes
>> > >> >>>>>> its behavior on the fly.
>> > >> >>>>>>
>> > >> >>>>>> It seems like there are two options here, one is to add a new
>> > >> config:
>> > >> >>>>>>
>> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
>> > >> >>>>>>   "enable.backwards.iteration
>> > >> >>>>>>
>> > >> >>>>>> Or we can add a feature flag in Materialized, like
>> > >> >>>>>>
>> > >> >>>>>> Materialized.enableBackwardsIteration()
>> > >> >>>>>>
>> > >> >>>>>> I think I'd personally lean toward the config, for the
>> following
>> > >> >>>> reason.
>> > >> >>>>>> The concern that Sophie raised is that someone's program may
>> depend
>> > >> >>>>>> on the existing contract of getting an empty iterator. We
>> don't
>> > >> want
>> > >> >>>> to
>> > >> >>>>>> switch behavior when they aren't expecting it, so we provide
>> them a
>> > >> >>>>>> config to assert that they _are_ expecting the new behavior,
>> which
>> > >> >>>>>> means they take responsibility for updating their code to
>> expect
>> > >> the
>> > >> >>>> new
>> > >> >>>>>> behavior.
>> > >> >>>>>>
>> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
>> behaviors
>> > >> on a
>> > >> >>>>>> per-query, or per-store basis. We just want people to be not
>> > >> surprised
>> > >> >>>>>> by this change in general.
>> > >> >>>>>>
>> > >> >>>>>> What do you think?
>> > >> >>>>>> Thanks,
>> > >> >>>>>> -John
>> > >> >>>>>>
>> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
>> > >> >>>>>>> Thank you both for the great feedback.
>> > >> >>>>>>>
>> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the need
>> for
>> > >> >>>>>>> additional API methods. And with a feature flag on
>> `StateStore`,
>> > >> >>>>>>> disabled by default, should no break current users.
>> > >> >>>>>>>
>> > >> >>>>>>> The only side-effect I can think of is that: by moving the
>> flag
>> > >> >>>>> upwards,
>> > >> >>>>>>> all later operations become affected; which might be ok for
>> most
>> > >> >>>> (all?)
>> > >> >>>>>>> cases. I can't think of an scenario where this would be an
>> issue,
>> > >> >>>> just
>> > >> >>>>>>> want to point this out.
>> > >> >>>>>>>
>> > >> >>>>>>> If moving to this approach, I'd like to check if I got this
>> right
>> > >> >>>>> before
>> > >> >>>>>>> updating the KIP:
>> > >> >>>>>>>
>> > >> >>>>>>> - only `StateStore` will change by having a new method:
>> > >> >>>>>>> `backwardIteration()`, `false` by default to keep things
>> > >> compatible.
>> > >> >>>>>>> - then all `*Stores` will have to update their implementation
>> > >> based
>> > >> >>>> on
>> > >> >>>>>>> this flag.
>> > >> >>>>>>>
>> > >> >>>>>>>
>> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>> > >> >>>>>>>>> There's no possibility that someone could be relying
>> > >> >>>>>>>>> on iterating over that range in increasing order, because
>> that's
>> > >> >>>> not
>> > >> >>>>>> what
>> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
>> an
>> > >> >>>> empty
>> > >> >>>>>>>> iterator
>> > >> >>>>>>>>
>> > >> >>>>>>>> I just meant that they might be relying on the assumption
>> that
>> > >> the
>> > >> >>>>>> range
>> > >> >>>>>>>> query
>> > >> >>>>>>>> will never return results with decreasing keys. The empty
>> > >> iterator
>> > >> >>>>>> wouldn't
>> > >> >>>>>>>> break that contract, but of course a surprise reverse
>> iterator
>> > >> >>>> would.
>> > >> >>>>>>>>
>> > >> >>>>>>>> FWIW I actually am in favor of automatically converting to a
>> > >> >>>> reverse
>> > >> >>>>>>>> iterator,
>> > >> >>>>>>>> I just thought we should consider whether this should be
>> off by
>> > >> >>>>>> default or
>> > >> >>>>>>>> even possible to disable at all.
>> > >> >>>>>>>>
>> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
>> > >> vvcephei@apache.org
>> > >> >>>>>
>> > >> >>>>>> wrote:
>> > >> >>>>>>>>
>> > >> >>>>>>>>> Thanks for the response, Sophie,
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> I wholeheartedly agree we should take as much into account
>> as
>> > >> >>>>> possible
>> > >> >>>>>>>>> up front, rather than regretting our decisions later. I
>> actually
>> > >> >>>> do
>> > >> >>>>>> share
>> > >> >>>>>>>>> your vague sense of worry, which was what led me to say
>> > >> initially
>> > >> >>>>>> that I
>> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
>> Sometimes, it's
>> > >> >>>>>> better
>> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
>> people
>> > >> >>>> will be
>> > >> >>>>>>>>> confused
>> > >> >>>>>>>>> than not.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> I really don't think that there's any danger of "relying
>> on a
>> > >> >>>> bug"
>> > >> >>>>>> here,
>> > >> >>>>>>>>> although
>> > >> >>>>>>>>> people certainly could be relying on current behavior. One
>> thing
>> > >> >>>> to
>> > >> >>>>> be
>> > >> >>>>>>>>> clear
>> > >> >>>>>>>>> about (which I just left a more detailed comment in
>> KAFKA-8159
>> > >> >>>>> about)
>> > >> >>>>>> is
>> > >> >>>>>>>>> that
>> > >> >>>>>>>>> when we say something like key1 > key2, this ordering is
>> defined
>> > >> >>>> by
>> > >> >>>>>> the
>> > >> >>>>>>>>> serde's output and nothing else.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> Currently, thanks to your fix in
>> > >> >>>>>> https://github.com/apache/kafka/pull/6521
>> > >> >>>>>>>>> ,
>> > >> >>>>>>>>> the store contract is that for range scans, if from > to,
>> then
>> > >> >>>> the
>> > >> >>>>>> store
>> > >> >>>>>>>>> must
>> > >> >>>>>>>>> return an empty iterator. There's no possibility that
>> someone
>> > >> >>>> could
>> > >> >>>>> be
>> > >> >>>>>>>>> relying
>> > >> >>>>>>>>> on iterating over that range in increasing order, because
>> that's
>> > >> >>>> not
>> > >> >>>>>> what
>> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
>> an
>> > >> >>>> empty
>> > >> >>>>>>>>> iterator.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> My counterproposal was to actually change this contract to
>> say
>> > >> >>>> that
>> > >> >>>>>> the
>> > >> >>>>>>>>> store
>> > >> >>>>>>>>> must return an iterator over the keys in that range, but
>> in the
>> > >> >>>>>> reverse
>> > >> >>>>>>>>> order.
>> > >> >>>>>>>>> So, in addition to considering whether this idea is "too
>> fancy"
>> > >> >>>> (aka
>> > >> >>>>>>>>> confusing),
>> > >> >>>>>>>>> we should also consider the likelihood of breaking an
>> existing
>> > >> >>>>>> program with
>> > >> >>>>>>>>> this behavior/contract change.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
>> strongly in
>> > >> >>>>> favor
>> > >> >>>>>> of my
>> > >> >>>>>>>>> proposal. I just wanted to present it for consideration
>> > >> alongside
>> > >> >>>>>> Jorge's
>> > >> >>>>>>>>> original one.
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> Thanks for raising these very good points,
>> > >> >>>>>>>>> -John
>> > >> >>>>>>>>>
>> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
>> > >> >>>>>>>>>>> Rather than working around it, I think we should just
>> fix it
>> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> That was my primary concern, although I do have a vague
>> sense
>> > >> of
>> > >> >>>>>> worry
>> > >> >>>>>>>>>> that we might be allowing users to get into trouble
>> without
>> > >> >>>>>> realizing it.
>> > >> >>>>>>>>>> For example if their custom serdes suffer a similar bug
>> as the
>> > >> >>>>> above,
>> > >> >>>>>>>>>> and/or
>> > >> >>>>>>>>>> they rely on getting results in increasing order (of the
>> keys)
>> > >> >>>> even
>> > >> >>>>>> when
>> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the
>> range
>> > >> >>>> query
>> > >> >>>>>> returns
>> > >> >>>>>>>>>> nothing in that case.
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or not,
>> but in
>> > >> >>>> that
>> > >> >>>>>> past
>> > >> >>>>>>>>>> we've
>> > >> >>>>>>>>>> taken the stance that we should not break compatibility
>> even if
>> > >> >>>> the
>> > >> >>>>>> user
>> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against this
>> > >> >>>> proposal,
>> > >> >>>>>> just
>> > >> >>>>>>>>>> laying
>> > >> >>>>>>>>>> out some considerations we should take into account. At
>> the end
>> > >> >>>> of
>> > >> >>>>>> the
>> > >> >>>>>>>>> day
>> > >> >>>>>>>>>> we should do what's right rather than maintain
>> compatibility
>> > >> >>>> with
>> > >> >>>>>>>>> existing
>> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle ground.
>> > >> >>>>>>>>>>
>> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
>> > >> >>>> vvcephei@apache.org>
>> > >> >>>>>>>>> wrote:
>> > >> >>>>>>>>>>> Thanks Sophie,
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around it,
>> I
>> > >> >>>> think
>> > >> >>>>> we
>> > >> >>>>>>>>> should
>> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern
>> that some
>> > >> >>>>>> serdes
>> > >> >>>>>>>>>>> might be incorrectly written.
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> Were there other practical concerns that you had in mind?
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> Thanks,
>> > >> >>>>>>>>>>> John
>> > >> >>>>>>>>>>>
>> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
>> wrote:
>> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from bytes
>> but I
>> > >> >>>>> think
>> > >> >>>>>>>>> there
>> > >> >>>>>>>>>>>> are some practical limitations to implementing this. In
>> > >> >>>>> particular
>> > >> >>>>>>>>>>>> I'm thinking about this issue
>> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159>
>> with the
>> > >> >>>>>> built-in
>> > >> >>>>>>>>>>> signed
>> > >> >>>>>>>>>>>> number serdes.
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> This trick would actually fix the problem for
>> > >> >>>> negative-negative
>> > >> >>>>>>>>> queries
>> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
>> > >> undetectable
>> > >> >>>>>>>>>>>> incorrect results for negative-positive queries. For
>> example,
>> > >> >>>> say
>> > >> >>>>>> you
>> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the Short
>> > >> serdes.
>> > >> >>>>> The
>> > >> >>>>>>>>>>>> serialized bytes for that are
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> from = 1111111111111111
>> > >> >>>>>>>>>>>> to = 0000000000000001
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> so we would end up flipping those and iterating over
>> all keys
>> > >> >>>>> from
>> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
>> > >> >>>>> lexicographical
>> > >> >>>>>>>>>>>> order means we would iterate over every key in the space
>> > >> >>>> *except*
>> > >> >>>>>> for
>> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant to be
>> > >> >>>> included
>> > >> >>>>>> in
>> > >> >>>>>>>>> the
>> > >> >>>>>>>>>>>> range query.
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> Currently we just log a warning and return an empty
>> iterator
>> > >> >>>> when
>> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but feels
>> > >> >>>> slightly
>> > >> >>>>>> more
>> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
>> reverse
>> > >> >>>>> queries
>> > >> >>>>>> we
>> > >> >>>>>>>>>>>> can't even log a warning in this case unless we wanted
>> to log
>> > >> >>>> a
>> > >> >>>>>>>>> warning
>> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
>> usage of a
>> > >> >>>> new
>> > >> >>>>>>>>>>>> feature.
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off the
>> top of
>> > >> >>>> my
>> > >> >>>>>> head
>> > >> >>>>>>>>> I
>> > >> >>>>>>>>>>> guess
>> > >> >>>>>>>>>>>> we could add a store config to enable/disable automatic
>> > >> >>>> reverse
>> > >> >>>>>>>>>>> iteration,
>> > >> >>>>>>>>>>>> which is off by default?
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> Sophie
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>>
>> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
>> > >> >>>>> vvcephei@apache.org>
>> > >> >>>>>>>>>>> wrote:
>> > >> >>>>>>>>>>>>> Hi there Jorge,
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> Thanks for the KIP!
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what do
>> you
>> > >> >>>> think
>> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing people to
>> flip
>> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from "A"
>> to "Z"
>> > >> >>>>> would
>> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
>> backward
>> > >> >>>> one?
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> Thanks,
>> > >> >>>>>>>>>>>>> -John
>> > >> >>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
>> > >> >>>>>>>>>>>>>> Hi everyone,
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617:
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>
>> > >> >>>>>>
>> > >> >>>>>
>> > >> >>>>
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>> Thanks!
>> > >> >>>>>>>>>>>>>> Jorge.
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>>
>> > >> >>>>>>>>>>>>>> Attachments:
>> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
>> > >> >>>>>>>
>> > >> >>>>>>>
>> > >> >>>>>>> Attachments:
>> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
>> > >> >>>>>>
>> > >> >>>>>
>> > >> >>>>
>> > >> >>>
>> > >> >>
>> > >>
>> > >>
>> >
>>
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by Jorge Esteban Quilcate Otoya <qu...@gmail.com>.
Thanks Matthias.

1. PR looks good to me.

2. About adding default methods, I think for correctness and backward
compatibility default methods should be added to avoid breaking potential
clients.
I have updated the KIP accordingly.


Jorge.

On Thu, Jul 23, 2020 at 5:19 AM Matthias J. Sax <mj...@apache.org> wrote:

> Finally cycling back to this.
>
> Overall I like the KIP.
>
> Two comments:
>
>  - I tried to figure out why the two InMemoerySessionStore methods are
> deprecated and it seems those annotations are there since the class was
> added; as this seems to be a bug, and there are no backward
> compatibility concerns, I just did a PR to remove those annotations:
>
> https://github.com/apache/kafka/pull/9061
>
>
>  - about moving the "read" methods from SessionStore to
> ReadOnlySessionsStore: while I agree that this make sense, it is
> strictly specking backward incompatible if we don't add default
> implementations. On the other hand, it seems that the likelihood that
> one only implement ReadOnlySessionStore is basically zero, so I am not
> sure if its worth to bother?
>
>
> -Matthias
>
> On 7/4/20 7:02 AM, John Roesler wrote:
> > Thanks Jorge,
> >
> > This KIP looks good to me!
> >
> > -John
> >
> > On Fri, Jul 3, 2020, at 03:19, Jorge Esteban Quilcate Otoya wrote:
> >> Hi John,
> >>
> >> Thanks for the feedback.
> >>
> >> I'd be happy to take the third option and consider moving methods to
> >> ReadOnlySessionStore as part of the KIP.
> >> Docs is updated to reflect these changes.
> >>
> >> Cheers,
> >> Jorge.
> >>
> >> On Thu, Jul 2, 2020 at 10:06 PM John Roesler <vv...@apache.org>
> wrote:
> >>
> >>> Hey Jorge,
> >>>
> >>> Thanks for the details. That sounds like a mistake to me on both
> counts.
> >>>
> >>> I don’t think you need to worry about those depreciations. If the
> >>> interface methods aren’t deprecated, then the methods are not
> deprecated.
> >>> We should remove the annotations, but it doesn’t need to be in the kip.
> >>>
> >>> I think any query methods should have been in the ReadOnly interface. I
> >>> guess it’s up to you whether you want to:
> >>> 1. Add the reverse methods next to the existing methods (what you have
> in
> >>> the kip right now)
> >>> 2. Partially fix it by adding your new methods to the ReadOnly
> interface
> >>> 3. Fully fix the problem by moving the existing methods as well as your
> >>> new ones. Since  SessionStore extends ReadOnlySessionStore, it’s ok
> just to
> >>> move the definitions.
> >>>
> >>> I’m ok with whatever you prefer.
> >>>
> >>> Thanks,
> >>> John
> >>>
> >>> On Thu, Jul 2, 2020, at 11:29, Jorge Esteban Quilcate Otoya wrote:
> >>>> (sorry for the spam)
> >>>>
> >>>> Actually `findSessions` are only deprecated on `InMemorySessionStore`,
> >>>> which seems strange as RocksDB and interfaces haven't marked these
> >>> methods
> >>>> as deprecated.
> >>>>
> >>>> Any hint on how to handle this?
> >>>>
> >>>> Cheers,
> >>>>
> >>>> On Thu, Jul 2, 2020 at 4:57 PM Jorge Esteban Quilcate Otoya <
> >>>> quilcate.jorge@gmail.com> wrote:
> >>>>
> >>>>> @John: I can see there are some deprecations in there as well. Will
> >>> update
> >>>>> the KIP.
> >>>>>
> >>>>> Thanks,
> >>>>> Jorge
> >>>>>
> >>>>>
> >>>>> On Thu, Jul 2, 2020 at 3:29 PM Jorge Esteban Quilcate Otoya <
> >>>>> quilcate.jorge@gmail.com> wrote:
> >>>>>
> >>>>>> Thanks John.
> >>>>>>
> >>>>>>> it looks like there’s a revision error in which two methods are
> >>>>>> proposed for SessionStore, but seem like they should be in
> >>>>>> ReadOnlySessionStore. Do I read that right?
> >>>>>>
> >>>>>> Yes, I've opted to keep the new methods alongside the existing ones.
> >>> In
> >>>>>> the case of SessionStore, `findSessions` are in `SessionStore`, and
> >>> `fetch`
> >>>>>> are in `ReadOnlySessionStore`. If it makes more sense, I can move
> all
> >>> of
> >>>>>> them to ReadOnlySessionStore.
> >>>>>> Let me know what you think.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Jorge.
> >>>>>>
> >>>>>> On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org>
> >>> wrote:
> >>>>>>
> >>>>>>> Hi Jorge,
> >>>>>>>
> >>>>>>> Thanks for the update. I think this is a good plan.
> >>>>>>>
> >>>>>>> I just took a look at the KIP again, and it looks like there’s a
> >>>>>>> revision error in which two methods are proposed for SessionStore,
> >>> but seem
> >>>>>>> like they should be in ReadOnlySessionStore. Do I read that right?
> >>>>>>>
> >>>>>>> Otherwise, I’m happy with your proposal.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> John
> >>>>>>>
> >>>>>>> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
> >>>>>>>> Quick update: KIP is updated with latest changes now.
> >>>>>>>> Will leave it open this week while working on the PR.
> >>>>>>>>
> >>>>>>>> Hope to open a new vote thread over the next few days if no
> >>> additional
> >>>>>>>> feedback is provided.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Jorge.
> >>>>>>>>
> >>>>>>>> On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
> >>>>>>>> quilcate.jorge@gmail.com> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks, John!
> >>>>>>>>>
> >>>>>>>>> Make sense to reconsider the current approach. I was heading in a
> >>>>>>> similar
> >>>>>>>>> direction while drafting the implementation. Metered, Caching,
> >>> and
> >>>>>>> other
> >>>>>>>>> layers will also have to get duplicated to build up new methods
> >>> in
> >>>>>>> `Stores`
> >>>>>>>>> factory, and class casting issues would appear on stores created
> >>> by
> >>>>>>> DSL.
> >>>>>>>>>
> >>>>>>>>> I will draft a proposal with new methods (move methods from
> >>> proposed
> >>>>>>>>> interfaces to existing ones) with default implementation in a KIP
> >>>>>>> update
> >>>>>>>>> and wait for Matthias to chime in to validate this approach.
> >>>>>>>>>
> >>>>>>>>> Jorge.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Sat, Jun 27, 2020 at 4:01 PM John Roesler <
> >>> vvcephei@apache.org>
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Jorge,
> >>>>>>>>>>
> >>>>>>>>>> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
> >>>>>>> releases.
> >>>>>>>>>>
> >>>>>>>>>> The idea to separate the new methods into "mixin" interfaces
> >>> seems
> >>>>>>>>>> like a good one, but as we've discovered in KIP-614, it doesn't
> >>> work
> >>>>>>>>>> out that way in practice. The problem is that the store
> >>>>>>> implementations
> >>>>>>>>>> are just the base layer that get composed with other layers in
> >>>>>>> Streams
> >>>>>>>>>> before they can be accessed in the DSL. This is extremely
> >>> subtle, so
> >>>>>>>>>> I'm going to put everyone to sleep with a detailed explanation:
> >>>>>>>>>>
> >>>>>>>>>> For example, this is the mechanism by which all KeyValueStore
> >>>>>>>>>> implementations get added to Streams:
> >>>>>>>>>>
> >>> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
> >>>>>>>>>> return new MeteredKeyValueStore<>(
> >>>>>>>>>>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
> >>>>>>>>>>   storeSupplier.metricsScope(),
> >>>>>>>>>>   time,
> >>>>>>>>>>   keySerde,
> >>>>>>>>>>   valueSerde
> >>>>>>>>>> );
> >>>>>>>>>>
> >>>>>>>>>> In the DSL, the store that a processor gets from the context
> >>> would
> >>>>>>> be
> >>>>>>>>>> the result of this composition. So even if the
> >>> storeSupplier.get()
> >>>>>>> returns
> >>>>>>>>>> a store that implements the "reverse" interface, when you try to
> >>>>>>> use it
> >>>>>>>>>> from a processor like:
> >>>>>>>>>> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> >>>>>>>>>> ReadOnlyBackwardWindowStore<K, V> store =
> >>>>>>>>>>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
> >>>>>>>>>>
> >>>>>>>>>> You'd just get a ClassCastException because it's actually a
> >>>>>>>>>> MeteredKeyValueStore, which doesn't implement
> >>>>>>>>>> ReadOnlyBackwardWindowStore.
> >>>>>>>>>>
> >>>>>>>>>> The only way to make this work would be to make the Metered,
> >>>>>>>>>> Caching, and Logging layers also implement the new interfaces,
> >>>>>>>>>> but this effectively forces implementations to also implement
> >>>>>>>>>> the interface. Otherwise, the intermediate layers would have to
> >>>>>>>>>> cast the store in each method, like this:
> >>>>>>>>>> MeteredWindowStore#backwardFetch {
> >>>>>>>>>>   ((ReadOnlyBackwardWindowStore<K, V>)
> >>> innerStore).backwardFetch(..)
> >>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> And then if the implementation doesn't "opt in" by implementing
> >>>>>>>>>> the interface, you'd get a ClassCastException, not when you get
> >>> the
> >>>>>>>>>> store, but when you try to use it.
> >>>>>>>>>>
> >>>>>>>>>> The fact that we get ClassCastExceptions no matter which way we
> >>>>>>>>>> turn here indicates that we're really not getting any benefit
> >>> from
> >>>>>>> the
> >>>>>>>>>> type system, which makes the extra interfaces seem not worth
> >>> all the
> >>>>>>>>>> code involved.
> >>>>>>>>>>
> >>>>>>>>>> Where we landed in KIP-614 is that, unless we want to completely
> >>>>>>>>>> revamp the way that StateStores work in the DSL, you might as
> >>>>>>>>>> well just add the new methods to the existing interfaces. To
> >>> prevent
> >>>>>>>>>> compilation errors, we can add default implementations that
> >>> throw
> >>>>>>>>>> UnsupportedOperationException. If a store doesn't opt in by
> >>>>>>>>>> implementing the methods, you'd get an
> >>>>>>> UnsupportedOperationException,
> >>>>>>>>>> which seems no worse, and maybe better, than the
> >>> ClassCastException
> >>>>>>>>>> you'd get if we go with the "mixin interface" approach.
> >>>>>>>>>>
> >>>>>>>>>> A quick note: This entire discussion focuses on the DSL. If
> >>> you're
> >>>>>>> just
> >>>>>>>>>> using the Processor API by directly adding the a custom store
> >>> to the
> >>>>>>>>>> Topology:
> >>>>>>>>>> org.apache.kafka.streams.Topology#addStateStore
> >>>>>>>>>> and then retrieving it in the processor via:
> >>>>>>>>>>
> >>> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
> >>>>>>>>>> in
> >>>>>>>>>> org.apache.kafka.streams.processor.Processor#init
> >>>>>>>>>>
> >>>>>>>>>> Then, you can both register and retrieve _any_ StateStore
> >>>>>>> implementation.
> >>>>>>>>>> There's no need to use KeyValueStore or any other built-in
> >>>>>>> interface.
> >>>>>>>>>> In other words, KeyValueStore and company are only part of the
> >>> DSL,
> >>>>>>>>>> not the PAPI. So, discussions about the build-in store
> >>> interfaces
> >>>>>>> are only
> >>>>>>>>>> really relevant in the context of the DSL, Transformers, and
> >>>>>>> Materialized.
> >>>>>>>>>>
> >>>>>>>>>> So, in conclusion, I'd really recommend just adding any new
> >>> methods
> >>>>>>> to
> >>>>>>>>>> the existing store interfaces. We might be able to revamp the
> >>> API
> >>>>>>> in the
> >>>>>>>>>> future to support mixins, but it's a much larger scope of work
> >>> than
> >>>>>>> this
> >>>>>>>>>> KIP.
> >>>>>>>>>> A more minor comment is that we don't need to add Deprecated
> >>>>>>> variants
> >>>>>>>>>> of new methods.
> >>>>>>>>>>
> >>>>>>>>>> Thanks again, and once again, I'm sorry I tuned out and didn't
> >>>>>>> offer this
> >>>>>>>>>> feedback before you revised the KIP.
> >>>>>>>>>> -John
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya
> >>> wrote:
> >>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>
> >>>>>>>>>>> I've updated the KIP, applying Matthias' feedback regarding
> >>>>>>> interface
> >>>>>>>>>>> hierarchy.
> >>>>>>>>>>>
> >>>>>>>>>>> Also, following the last email, I think we can consider
> >>> reverse
> >>>>>>>>>> operations
> >>>>>>>>>>> on KeyValue range as well, as implementation supports
> >>>>>>> lexicographic
> >>>>>>>>>> order.
> >>>>>>>>>>>
> >>>>>>>>>>> I considered different naming between Key-based ranges and
> >>>>>>> Time-based
> >>>>>>>>>>> ranges, and mitigate confusion when fetching keys and time
> >>> ranges
> >>>>>>> as
> >>>>>>>>>>> WindowStore does:
> >>>>>>>>>>>
> >>>>>>>>>>> Key-based ranges: reverseRange(), reverseAll()
> >>>>>>>>>>> Time-based ranges: backwardFetch()
> >>>>>>>>>>>
> >>>>>>>>>>> Then, key-based changes apply to KeyValueStore, and time-based
> >>>>>>> changes
> >>>>>>>>>> to
> >>>>>>>>>>> Window and Session stores.
> >>>>>>>>>>>
> >>>>>>>>>>> Let me know if you have any questions.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Jorge.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya
> >>> <
> >>>>>>>>>>> quilcate.jorge@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi everyone, sorry for the late reply.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks Matthias for your feedback. I think it makes sense to
> >>>>>>>>>> reconsider
> >>>>>>>>>>>> the current design based on your input.
> >>>>>>>>>>>>
> >>>>>>>>>>>> After digging deeper into the current implementation, I'd
> >>> like
> >>>>>>> to
> >>>>>>>>>> bring my
> >>>>>>>>>>>> current understanding to be double-checked as it might be
> >>>>>>> redefining
> >>>>>>>>>> the
> >>>>>>>>>>>> KIP's scope:
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. There are 2 ranges been exposed by different stores:
> >>>>>>>>>>>>
> >>>>>>>>>>>> a. Key Range
> >>>>>>>>>>>> b. Timestamp Range
> >>>>>>>>>>>>
> >>>>>>>>>>>> So far, we have discussed covering both.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Key Range functions do not provide ordering guarantees by
> >>>>>>> design:
> >>>>>>>>>>>>
> >>>>>>>>>>>> ```ReadOnlyKeyValueStore.java
> >>>>>>>>>>>>     /**
> >>>>>>>>>>>>      * Get an iterator over a given range of keys. This
> >>>>>>> iterator must
> >>>>>>>>>> be
> >>>>>>>>>>>> closed after use.
> >>>>>>>>>>>>      * The returned iterator must be safe from {@link
> >>>>>>>>>>>> java.util.ConcurrentModificationException}s
> >>>>>>>>>>>>      * and must not return null values. No ordering
> >>> guarantees
> >>>>>>> are
> >>>>>>>>>>>> provided.
> >>>>>>>>>>>>      * ...
> >>>>>>>>>>>>      */
> >>>>>>>>>>>>      KeyValueIterator<K, V> range(K from, K to);
> >>>>>>>>>>>> ```
> >>>>>>>>>>>>
> >>>>>>>>>>>> Therefore, I'd propose removing Key range operations from
> >>> the
> >>>>>>> scope.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 3. Timestamp Range operations happen at the SegmentsStore
> >>> level
> >>>>>>>>>> (internal)
> >>>>>>>>>>>> API
> >>>>>>>>>>>>
> >>>>>>>>>>>> AFAICT, Segments wrappers handle all Timestamp ranges
> >>> queries.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd propose extending `Segments#segments(timeFrom, timeTo,
> >>>>>>> backwards)`
> >>>>>>>>>>>> with a flag for backwards operations.
> >>>>>>>>>>>>
> >>>>>>>>>>>> As segments returned will be processed backwards, I'm not
> >>>>>>> extending
> >>>>>>>>>>>> KeyValueStores to query each segment backwards as previous
> >>>>>>> point 2.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 4. Extend WindowStores implementations with a new
> >>>>>>>>>>>> WindowBackwardStore/ReadOnlyBackwardStore:
> >>>>>>>>>>>>
> >>>>>>>>>>>> ```java
> >>>>>>>>>>>> public interface ReadOnlyWindowBackwardStore<K, V> {
> >>>>>>>>>>>>     WindowStoreIterator<V> backwardFetch(K key, Instant
> >>> from,
> >>>>>>> Instant
> >>>>>>>>>> to)
> >>>>>>>>>>>> throws IllegalArgumentException;
> >>>>>>>>>>>>
> >>>>>>>>>>>>     KeyValueIterator<Windowed<K>, V> backwardFetch(K from,
> >>> K to,
> >>>>>>>>>> Instant
> >>>>>>>>>>>> fromTime, Instant toTime)
> >>>>>>>>>>>>         throws IllegalArgumentException;
> >>>>>>>>>>>>
> >>>>>>>>>>>>     KeyValueIterator<Windowed<K>, V>
> >>> backwardFetchAll(Instant
> >>>>>>> from,
> >>>>>>>>>>>> Instant to) throws IllegalArgumentException;
> >>>>>>>>>>>> ```
> >>>>>>>>>>>>
> >>>>>>>>>>>> 5. SessionStore is a bit different as it has fetch/find
> >>> sessions
> >>>>>>>>>> spread
> >>>>>>>>>>>> between SessionStore and ReadOnlySessionStore.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd propose a new interface `SessionBackwardStore` to expose
> >>>>>>> backward
> >>>>>>>>>> find
> >>>>>>>>>>>> operations:
> >>>>>>>>>>>>
> >>>>>>>>>>>> ```java
> >>>>>>>>>>>> public interface SessionBackwardStore<K, AGG> {
> >>>>>>>>>>>>     KeyValueIterator<Windowed<K>, AGG>
> >>>>>>> backwardFindSessions(final K
> >>>>>>>>>> key,
> >>>>>>>>>>>> final long earliestSessionEndTime, final long
> >>>>>>> latestSessionStartTime);
> >>>>>>>>>>>>
> >>>>>>>>>>>>     KeyValueIterator<Windowed<K>, AGG>
> >>>>>>> backwardFindSessions(final K
> >>>>>>>>>>>> keyFrom, final K keyTo, final long earliestSessionEndTime,
> >>>>>>> final long
> >>>>>>>>>>>> latestSessionStartTime);
> >>>>>>>>>>>> }
> >>>>>>>>>>>> ```
> >>>>>>>>>>>>
> >>>>>>>>>>>> If this understanding is correct I'll proceed to update the
> >>> KIP
> >>>>>>> based
> >>>>>>>>>> on
> >>>>>>>>>>>> this.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Jorge.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <
> >>>>>>> mjsax@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hey,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Sorry that I am late to the game. I am not 100% convinced
> >>>>>>> about the
> >>>>>>>>>>>>> current proposal. Using a new config as feature flag seems
> >>> to
> >>>>>>> be
> >>>>>>>>>> rather
> >>>>>>>>>>>>> "nasty" to me, and flipping from/to is a little bit too
> >>> fancy
> >>>>>>> for my
> >>>>>>>>>>>>> personal taste.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I agree, that the original proposal using a "ReadDirection"
> >>>>>>> enum is
> >>>>>>>>>> not
> >>>>>>>>>>>>> ideal either.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thus, I would like to put out a new idea: We could add a
> >>> new
> >>>>>>>>>> interface
> >>>>>>>>>>>>> that offers new methods that return revers iterators.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The KIP already proposes to add `reverseAll()` and it seems
> >>>>>>> backward
> >>>>>>>>>>>>> incompatible to just add this method to
> >>>>>>> `ReadOnlyKeyValueStore` and
> >>>>>>>>>>>>> `ReadOnlyWindowStore`. I don't think we could provide a
> >>> useful
> >>>>>>>>>> default
> >>>>>>>>>>>>> implementation for custom stores and thus either break
> >>>>>>> compatibility
> >>>>>>>>>> or
> >>>>>>>>>>>>> need add a default that just throws an exception. Neither
> >>>>>>> seems to
> >>>>>>>>>> be a
> >>>>>>>>>>>>> good option.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Using a new interface avoid this issue and allows users
> >>>>>>> implementing
> >>>>>>>>>>>>> custom stores to opt-in by adding the interface to their
> >>>>>>> stores.
> >>>>>>>>>>>>> Furthermore, we don't need any config. In the end, we
> >>>>>>> encapsulte the
> >>>>>>>>>>>>> change into the store, and our runtime is agnostic to it
> >>> (as it
> >>>>>>>>>> should
> >>>>>>>>>>>>> be).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The hierarchy becomes a little complex (but uses would not
> >>>>>>> really see
> >>>>>>>>>>>>> the complexity):
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> // exsiting
> >>>>>>>>>>>>> ReadOnlyKeyValueStore
> >>>>>>>>>>>>> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> // helper interface; users don't care
> >>>>>>>>>>>>> // need similar ones for other stores
> >>>>>>>>>>>>> ReverseReadOnlyKeyValueStore {
> >>>>>>>>>>>>>     KeyValueIterator<K, V> reverseRange(K from, K to);
> >>>>>>>>>>>>>     KeyValueIterator<K, V> reverseAll();
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> // two new user facing interfaces for kv-store
> >>>>>>>>>>>>> // need similar ones for other stores
> >>>>>>>>>>>>> ReadOnlyKeyValueStoreWithReverseIterators extends
> >>>>>>>>>> ReadOnlyKeyValueStore,
> >>>>>>>>>>>>> ReverseReadOnlyKeyValueStore
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> KeyValueStoreWithReverseIterators extends KeyValueStore,
> >>>>>>>>>>>>> ReverseReadOnlyKeyValueStore
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> // updated (also internal)
> >>>>>>>>>>>>> // also need to update other built-in stores
> >>>>>>>>>>>>> RocksDB implements KeyValueStoreWithReverseIterators,
> >>>>>>>>>> BulkLoadingStore
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In the end, user would only care about the two (for
> >>> kv-case)
> >>>>>>> new
> >>>>>>>>>>>>> interface that offer revers iterator (read only and
> >>> regular)
> >>>>>>> and can
> >>>>>>>>>>>>> cast stores accordingly in their Processors/Transformers or
> >>>>>>> via IQ.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Btw: if we add revers iterator for KeyValue and Window
> >>> store,
> >>>>>>> should
> >>>>>>>>>> we
> >>>>>>>>>>>>> do the same for Session store?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This might be more code to write, but I believe it
> >>> provides the
> >>>>>>>>>> better
> >>>>>>>>>>>>> user experience. Thoughts?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 5/26/20 8:47 PM, John Roesler wrote:
> >>>>>>>>>>>>>> Sorry for my silence, Jorge,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I've just taken another look, and I'm personally happy
> >>> with
> >>>>>>> the
> >>>>>>>>>> KIP.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate
> >>> Otoya
> >>>>>>> wrote:
> >>>>>>>>>>>>>>> If no additional comments, I will proceed to start the a
> >>>>>>> vote
> >>>>>>>>>> thread.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Thanks a lot for your feedback!
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate
> >>>>>>> Otoya <
> >>>>>>>>>>>>>>> quilcate.jorge@gmail.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks Sophie. I like the `reverseAll()` idea.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I updated the KIP with your feedback.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
> >>>>>>>>>>>>> sophie@confluent.io>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hm, the case of `all()` does seem to present a
> >>> dilemma in
> >>>>>>> the
> >>>>>>>>>> case of
> >>>>>>>>>>>>>>>>> variable-length keys.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> In the case of fixed-length keys, you can just compute
> >>>>>>> the keys
> >>>>>>>>>> that
> >>>>>>>>>>>>>>>>> correspond
> >>>>>>>>>>>>>>>>> to the maximum and minimum serialized bytes, then
> >>> perform
> >>>>>>> a
> >>>>>>>>>> `range()`
> >>>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>> instead of an `all()`. If your keys don't have a
> >>>>>>> well-defined
> >>>>>>>>>>>>> ordering
> >>>>>>>>>>>>>>>>> such
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>> you can't determine the MAX_KEY, then you probably
> >>> don't
> >>>>>>> care
> >>>>>>>>>> about
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> iterator order anyway.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>  But with variable-length keys, there is no MAX_KEY.
> >>> If
> >>>>>>> all your
> >>>>>>>>>>>>> keys were
> >>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact
> >>>>>>> the only
> >>>>>>>>>> way
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>> figure out the
> >>>>>>>>>>>>>>>>> maximum key in the store is by using `all()` -- and
> >>>>>>> without a
> >>>>>>>>>> reverse
> >>>>>>>>>>>>>>>>> iterator, you're
> >>>>>>>>>>>>>>>>> doomed to iterate through every single key just to
> >>> answer
> >>>>>>> that
> >>>>>>>>>> simple
> >>>>>>>>>>>>>>>>> question.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> That said, I still think determining the iterator
> >>> order
> >>>>>>> based
> >>>>>>>>>> on the
> >>>>>>>>>>>>>>>>> to/from bytes
> >>>>>>>>>>>>>>>>> makes a lot of intuitive sense and gives the API a
> >>> nice
> >>>>>>>>>> symmetry.
> >>>>>>>>>>>>> What if
> >>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>> solved the `all()` problem by just giving `all()` a
> >>>>>>> reverse
> >>>>>>>>>> form to
> >>>>>>>>>>>>>>>>> complement it?
> >>>>>>>>>>>>>>>>> Ie we would have `all()` and `reverseAll()`, or
> >>> something
> >>>>>>> to
> >>>>>>>>>> that
> >>>>>>>>>>>>> effect.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
> >>>>>>> Otoya <
> >>>>>>>>>>>>>>>>> quilcate.jorge@gmail.com> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Thanks John.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Agree. I like the first approach as well, with
> >>>>>>> StreamsConfig
> >>>>>>>>>> flag
> >>>>>>>>>>>>>>>>> passing
> >>>>>>>>>>>>>>>>>> by via ProcessorContext.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Another positive effect with "reverse parameters" is
> >>>>>>> that in
> >>>>>>>>>> the
> >>>>>>>>>>>>> case of
> >>>>>>>>>>>>>>>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can
> >>>>>>> decide
> >>>>>>>>>> _which_
> >>>>>>>>>>>>> pair
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> flip, whether with `ReadDirection` enum it apply to
> >>> both.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> The only issue I've found while reviewing the KIP is
> >>> that
> >>>>>>>>>> `all()`
> >>>>>>>>>>>>> won't
> >>>>>>>>>>>>>>>>> fit
> >>>>>>>>>>>>>>>>>> within this approach.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> We could remove it from the KIP and argue that for
> >>>>>>> WindowStore,
> >>>>>>>>>>>>>>>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all
> >>> in
> >>>>>>> reverse
> >>>>>>>>>>>>> order,
> >>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>> for KeyValueStore, no ordering guarantees are
> >>> provided.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> If there is consensus with this changes, I will go
> >>> and
> >>>>>>> update
> >>>>>>>>>> the
> >>>>>>>>>>>>> KIP.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
> >>>>>>>>>> vvcephei@apache.org>
> >>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi Jorge,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Thanks for that idea. I agree, a feature flag would
> >>>>>>> protect
> >>>>>>>>>> anyone
> >>>>>>>>>>>>>>>>>>> who may be depending on the current behavior.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It seems better to locate the feature flag in the
> >>>>>>>>>> initialization
> >>>>>>>>>>>>>>>>> logic of
> >>>>>>>>>>>>>>>>>>> the store, rather than have a method on the "live"
> >>>>>>> store that
> >>>>>>>>>>>>> changes
> >>>>>>>>>>>>>>>>>>> its behavior on the fly.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> It seems like there are two options here, one is to
> >>> add
> >>>>>>> a new
> >>>>>>>>>>>>> config:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
> >>>>>>>>>>>>>>>>>>>   "enable.backwards.iteration
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Or we can add a feature flag in Materialized, like
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Materialized.enableBackwardsIteration()
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I think I'd personally lean toward the config, for
> >>> the
> >>>>>>>>>> following
> >>>>>>>>>>>>>>>>> reason.
> >>>>>>>>>>>>>>>>>>> The concern that Sophie raised is that someone's
> >>>>>>> program may
> >>>>>>>>>> depend
> >>>>>>>>>>>>>>>>>>> on the existing contract of getting an empty
> >>> iterator.
> >>>>>>> We
> >>>>>>>>>> don't
> >>>>>>>>>>>>> want
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>>> switch behavior when they aren't expecting it, so we
> >>>>>>> provide
> >>>>>>>>>> them a
> >>>>>>>>>>>>>>>>>>> config to assert that they _are_ expecting the new
> >>>>>>> behavior,
> >>>>>>>>>> which
> >>>>>>>>>>>>>>>>>>> means they take responsibility for updating their
> >>> code
> >>>>>>> to
> >>>>>>>>>> expect
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>> behavior.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> There doesn't seem to be a reason to offer a choice
> >>> of
> >>>>>>>>>> behaviors
> >>>>>>>>>>>>> on a
> >>>>>>>>>>>>>>>>>>> per-query, or per-store basis. We just want people
> >>> to
> >>>>>>> be not
> >>>>>>>>>>>>> surprised
> >>>>>>>>>>>>>>>>>>> by this change in general.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> What do you think?
> >>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate
> >>> wrote:
> >>>>>>>>>>>>>>>>>>>> Thank you both for the great feedback.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I like the "fancy" proposal :), and how it removes
> >>> the
> >>>>>>> need
> >>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> additional API methods. And with a feature flag on
> >>>>>>>>>> `StateStore`,
> >>>>>>>>>>>>>>>>>>>> disabled by default, should no break current users.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> The only side-effect I can think of is that: by
> >>> moving
> >>>>>>> the
> >>>>>>>>>> flag
> >>>>>>>>>>>>>>>>>> upwards,
> >>>>>>>>>>>>>>>>>>>> all later operations become affected; which might
> >>> be
> >>>>>>> ok for
> >>>>>>>>>> most
> >>>>>>>>>>>>>>>>> (all?)
> >>>>>>>>>>>>>>>>>>>> cases. I can't think of an scenario where this
> >>> would
> >>>>>>> be an
> >>>>>>>>>> issue,
> >>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>> want to point this out.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> If moving to this approach, I'd like to check if I
> >>> got
> >>>>>>> this
> >>>>>>>>>> right
> >>>>>>>>>>>>>>>>>> before
> >>>>>>>>>>>>>>>>>>>> updating the KIP:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> - only `StateStore` will change by having a new
> >>> method:
> >>>>>>>>>>>>>>>>>>>> `backwardIteration()`, `false` by default to keep
> >>>>>>> things
> >>>>>>>>>>>>> compatible.
> >>>>>>>>>>>>>>>>>>>> - then all `*Stores` will have to update their
> >>>>>>> implementation
> >>>>>>>>>>>>> based
> >>>>>>>>>>>>>>>>> on
> >>>>>>>>>>>>>>>>>>>> this flag.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> >>>>>>>>>>>>>>>>>>>>>> There's no possibility that someone could be
> >>> relying
> >>>>>>>>>>>>>>>>>>>>>> on iterating over that range in increasing order,
> >>>>>>> because
> >>>>>>>>>> that's
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>>> happens. However, they could indeed be relying on
> >>>>>>> getting
> >>>>>>>>>> an
> >>>>>>>>>>>>>>>>> empty
> >>>>>>>>>>>>>>>>>>>>> iterator
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> I just meant that they might be relying on the
> >>>>>>> assumption
> >>>>>>>>>> that
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> range
> >>>>>>>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>>>> will never return results with decreasing keys.
> >>> The
> >>>>>>> empty
> >>>>>>>>>>>>> iterator
> >>>>>>>>>>>>>>>>>>> wouldn't
> >>>>>>>>>>>>>>>>>>>>> break that contract, but of course a surprise
> >>> reverse
> >>>>>>>>>> iterator
> >>>>>>>>>>>>>>>>> would.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> FWIW I actually am in favor of automatically
> >>>>>>> converting to a
> >>>>>>>>>>>>>>>>> reverse
> >>>>>>>>>>>>>>>>>>>>> iterator,
> >>>>>>>>>>>>>>>>>>>>> I just thought we should consider whether this
> >>> should
> >>>>>>> be
> >>>>>>>>>> off by
> >>>>>>>>>>>>>>>>>>> default or
> >>>>>>>>>>>>>>>>>>>>> even possible to disable at all.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
> >>>>>>>>>>>>> vvcephei@apache.org
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Sophie,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I wholeheartedly agree we should take as much
> >>> into
> >>>>>>> account
> >>>>>>>>>> as
> >>>>>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>>>>>>>>> up front, rather than regretting our decisions
> >>>>>>> later. I
> >>>>>>>>>> actually
> >>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>>>> share
> >>>>>>>>>>>>>>>>>>>>>> your vague sense of worry, which was what led me
> >>> to
> >>>>>>> say
> >>>>>>>>>>>>> initially
> >>>>>>>>>>>>>>>>>>> that I
> >>>>>>>>>>>>>>>>>>>>>> thought my counterproposal might be "too fancy".
> >>>>>>>>>> Sometimes, it's
> >>>>>>>>>>>>>>>>>>> better
> >>>>>>>>>>>>>>>>>>>>>> to be explicit instead of "elegant", if we think
> >>> more
> >>>>>>>>>> people
> >>>>>>>>>>>>>>>>> will be
> >>>>>>>>>>>>>>>>>>>>>> confused
> >>>>>>>>>>>>>>>>>>>>>> than not.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I really don't think that there's any danger of
> >>>>>>> "relying
> >>>>>>>>>> on a
> >>>>>>>>>>>>>>>>> bug"
> >>>>>>>>>>>>>>>>>>> here,
> >>>>>>>>>>>>>>>>>>>>>> although
> >>>>>>>>>>>>>>>>>>>>>> people certainly could be relying on current
> >>>>>>> behavior. One
> >>>>>>>>>> thing
> >>>>>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> clear
> >>>>>>>>>>>>>>>>>>>>>> about (which I just left a more detailed comment
> >>> in
> >>>>>>>>>> KAFKA-8159
> >>>>>>>>>>>>>>>>>> about)
> >>>>>>>>>>>>>>>>>>> is
> >>>>>>>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>>>>> when we say something like key1 > key2, this
> >>>>>>> ordering is
> >>>>>>>>>> defined
> >>>>>>>>>>>>>>>>> by
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> serde's output and nothing else.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Currently, thanks to your fix in
> >>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6521
> >>>>>>>>>>>>>>>>>>>>>> ,
> >>>>>>>>>>>>>>>>>>>>>> the store contract is that for range scans, if
> >>> from
> >>>>>>>> to,
> >>>>>>>>>> then
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> store
> >>>>>>>>>>>>>>>>>>>>>> must
> >>>>>>>>>>>>>>>>>>>>>> return an empty iterator. There's no possibility
> >>> that
> >>>>>>>>>> someone
> >>>>>>>>>>>>>>>>> could
> >>>>>>>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>>>>>>>> relying
> >>>>>>>>>>>>>>>>>>>>>> on iterating over that range in increasing order,
> >>>>>>> because
> >>>>>>>>>> that's
> >>>>>>>>>>>>>>>>> not
> >>>>>>>>>>>>>>>>>>> what
> >>>>>>>>>>>>>>>>>>>>>> happens. However, they could indeed be relying on
> >>>>>>> getting
> >>>>>>>>>> an
> >>>>>>>>>>>>>>>>> empty
> >>>>>>>>>>>>>>>>>>>>>> iterator.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> My counterproposal was to actually change this
> >>>>>>> contract to
> >>>>>>>>>> say
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> store
> >>>>>>>>>>>>>>>>>>>>>> must return an iterator over the keys in that
> >>> range,
> >>>>>>> but
> >>>>>>>>>> in the
> >>>>>>>>>>>>>>>>>>> reverse
> >>>>>>>>>>>>>>>>>>>>>> order.
> >>>>>>>>>>>>>>>>>>>>>> So, in addition to considering whether this idea
> >>> is
> >>>>>>> "too
> >>>>>>>>>> fancy"
> >>>>>>>>>>>>>>>>> (aka
> >>>>>>>>>>>>>>>>>>>>>> confusing),
> >>>>>>>>>>>>>>>>>>>>>> we should also consider the likelihood of
> >>> breaking an
> >>>>>>>>>> existing
> >>>>>>>>>>>>>>>>>>> program with
> >>>>>>>>>>>>>>>>>>>>>> this behavior/contract change.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> To echo your clarification, I'm also not
> >>> advocating
> >>>>>>>>>> strongly in
> >>>>>>>>>>>>>>>>>> favor
> >>>>>>>>>>>>>>>>>>> of my
> >>>>>>>>>>>>>>>>>>>>>> proposal. I just wanted to present it for
> >>>>>>> consideration
> >>>>>>>>>>>>> alongside
> >>>>>>>>>>>>>>>>>>> Jorge's
> >>>>>>>>>>>>>>>>>>>>>> original one.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Thanks for raising these very good points,
> >>>>>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie
> >>> Blee-Goldman
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> Rather than working around it, I think we
> >>> should
> >>>>>>> just
> >>>>>>>>>> fix it
> >>>>>>>>>>>>>>>>>>>>>>> Now *that's* a "fancy" idea :P
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> That was my primary concern, although I do have
> >>> a
> >>>>>>> vague
> >>>>>>>>>> sense
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> worry
> >>>>>>>>>>>>>>>>>>>>>>> that we might be allowing users to get into
> >>> trouble
> >>>>>>>>>> without
> >>>>>>>>>>>>>>>>>>> realizing it.
> >>>>>>>>>>>>>>>>>>>>>>> For example if their custom serdes suffer a
> >>> similar
> >>>>>>> bug
> >>>>>>>>>> as the
> >>>>>>>>>>>>>>>>>> above,
> >>>>>>>>>>>>>>>>>>>>>>> and/or
> >>>>>>>>>>>>>>>>>>>>>>> they rely on getting results in increasing order
> >>>>>>> (of the
> >>>>>>>>>> keys)
> >>>>>>>>>>>>>>>>> even
> >>>>>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>>>>> to < from. Maybe they're relying on the fact
> >>> that
> >>>>>>> the
> >>>>>>>>>> range
> >>>>>>>>>>>>>>>>> query
> >>>>>>>>>>>>>>>>>>> returns
> >>>>>>>>>>>>>>>>>>>>>>> nothing in that case.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Not sure if that qualifies as relying on a bug
> >>> or
> >>>>>>> not,
> >>>>>>>>>> but in
> >>>>>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>>>>>> past
> >>>>>>>>>>>>>>>>>>>>>>> we've
> >>>>>>>>>>>>>>>>>>>>>>> taken the stance that we should not break
> >>>>>>> compatibility
> >>>>>>>>>> even if
> >>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>> user
> >>>>>>>>>>>>>>>>>>>>>>> was relying on bugs or unintentional behavior.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> Just to clarify I'm not advocating strongly
> >>> against
> >>>>>>> this
> >>>>>>>>>>>>>>>>> proposal,
> >>>>>>>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>>>>>>>>> laying
> >>>>>>>>>>>>>>>>>>>>>>> out some considerations we should take into
> >>>>>>> account. At
> >>>>>>>>>> the end
> >>>>>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>> day
> >>>>>>>>>>>>>>>>>>>>>>> we should do what's right rather than maintain
> >>>>>>>>>> compatibility
> >>>>>>>>>>>>>>>>> with
> >>>>>>>>>>>>>>>>>>>>>> existing
> >>>>>>>>>>>>>>>>>>>>>>> bugs, but sometimes there's a reasonable middle
> >>>>>>> ground.
> >>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
> >>>>>>>>>>>>>>>>> vvcephei@apache.org>
> >>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks Sophie,
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Woah, that’s a nasty bug. Rather than working
> >>>>>>> around it,
> >>>>>>>>>> I
> >>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>> should
> >>>>>>>>>>>>>>>>>>>>>>>> just fix it. I’ll leave some comments on the
> >>> Jira.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> It doesn’t seem like it should be this KIP’s
> >>>>>>> concern
> >>>>>>>>>> that some
> >>>>>>>>>>>>>>>>>>> serdes
> >>>>>>>>>>>>>>>>>>>>>>>> might be incorrectly written.
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Were there other practical concerns that you
> >>> had
> >>>>>>> in mind?
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>>> John
> >>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie
> >>> Blee-Goldman
> >>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>> I like this "fancy idea" to just flip the
> >>> to/from
> >>>>>>> bytes
> >>>>>>>>>> but I
> >>>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>> there
> >>>>>>>>>>>>>>>>>>>>>>>>> are some practical limitations to implementing
> >>>>>>> this. In
> >>>>>>>>>>>>>>>>>> particular
> >>>>>>>>>>>>>>>>>>>>>>>>> I'm thinking about this issue
> >>>>>>>>>>>>>>>>>>>>>>>>> <
> >>> https://issues.apache.org/jira/browse/KAFKA-8159
> >>>>>>>>
> >>>>>>>>>> with the
> >>>>>>>>>>>>>>>>>>> built-in
> >>>>>>>>>>>>>>>>>>>>>>>> signed
> >>>>>>>>>>>>>>>>>>>>>>>>> number serdes.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> This trick would actually fix the problem for
> >>>>>>>>>>>>>>>>> negative-negative
> >>>>>>>>>>>>>>>>>>>>>> queries
> >>>>>>>>>>>>>>>>>>>>>>>>> (ie where to & from are negative) but would
> >>> cause
> >>>>>>>>>>>>> undetectable
> >>>>>>>>>>>>>>>>>>>>>>>>> incorrect results for negative-positive
> >>> queries.
> >>>>>>> For
> >>>>>>>>>> example,
> >>>>>>>>>>>>>>>>> say
> >>>>>>>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>>>>>>>>>>>> call #range with from = -1 and to = 1, using
> >>> the
> >>>>>>> Short
> >>>>>>>>>>>>> serdes.
> >>>>>>>>>>>>>>>>>> The
> >>>>>>>>>>>>>>>>>>>>>>>>> serialized bytes for that are
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> from = 1111111111111111
> >>>>>>>>>>>>>>>>>>>>>>>>> to = 0000000000000001
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> so we would end up flipping those and
> >>> iterating
> >>>>>>> over
> >>>>>>>>>> all keys
> >>>>>>>>>>>>>>>>>> from
> >>>>>>>>>>>>>>>>>>>>>>>>> 0000000000000001 to 1111111111111111.
> >>> Iterating in
> >>>>>>>>>>>>>>>>>> lexicographical
> >>>>>>>>>>>>>>>>>>>>>>>>> order means we would iterate over every key in
> >>>>>>> the space
> >>>>>>>>>>>>>>>>> *except*
> >>>>>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>>>>>>> 0, but 0 is actually the *only* other key we
> >>>>>>> meant to be
> >>>>>>>>>>>>>>>>> included
> >>>>>>>>>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>>>>>>>>> range query.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Currently we just log a warning and return an
> >>>>>>> empty
> >>>>>>>>>> iterator
> >>>>>>>>>>>>>>>>> when
> >>>>>>>>>>>>>>>>>>>>>>>>> to < from, which is obviously also incorrect
> >>> but
> >>>>>>> feels
> >>>>>>>>>>>>>>>>> slightly
> >>>>>>>>>>>>>>>>>>> more
> >>>>>>>>>>>>>>>>>>>>>>>>> palatable. If we start automatically
> >>> converting to
> >>>>>>>>>> reverse
> >>>>>>>>>>>>>>>>>> queries
> >>>>>>>>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>>>>>>>>>>> can't even log a warning in this case unless
> >>> we
> >>>>>>> wanted
> >>>>>>>>>> to log
> >>>>>>>>>>>>>>>>> a
> >>>>>>>>>>>>>>>>>>>>>> warning
> >>>>>>>>>>>>>>>>>>>>>>>>> every time, which would be weird to do for a
> >>> valid
> >>>>>>>>>> usage of a
> >>>>>>>>>>>>>>>>> new
> >>>>>>>>>>>>>>>>>>>>>>>>> feature.
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> All that said, I still like the idea overall.
> >>> Off
> >>>>>>> the
> >>>>>>>>>> top of
> >>>>>>>>>>>>>>>>> my
> >>>>>>>>>>>>>>>>>>> head
> >>>>>>>>>>>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>>>>>>>>>> guess
> >>>>>>>>>>>>>>>>>>>>>>>>> we could add a store config to enable/disable
> >>>>>>> automatic
> >>>>>>>>>>>>>>>>> reverse
> >>>>>>>>>>>>>>>>>>>>>>>> iteration,
> >>>>>>>>>>>>>>>>>>>>>>>>> which is off by default?
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP! This will be a nice
> >>> addition
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> Sophie
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
> >>>>>>>>>>>>>>>>>> vvcephei@apache.org>
> >>>>>>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi there Jorge,
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP!
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I think this feature sounds very reasonable.
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but
> >>>>>>> what do
> >>>>>>>>>> you
> >>>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>>>>>>>>>> about avoiding the enum by instead allowing
> >>>>>>> people to
> >>>>>>>>>> flip
> >>>>>>>>>>>>>>>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading
> >>>>>>> from "A"
> >>>>>>>>>> to "Z"
> >>>>>>>>>>>>>>>>>> would
> >>>>>>>>>>>>>>>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would
> >>> be a
> >>>>>>>>>> backward
> >>>>>>>>>>>>>>>>> one?
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge
> >>> Quilcate
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to start the discussion for
> >>>>>>> KIP-617:
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Jorge.
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>>>>>> Attachments:
> >>>>>>>>>>>>>>>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Attachments:
> >>>>>>>>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by "Matthias J. Sax" <mj...@apache.org>.
Finally cycling back to this.

Overall I like the KIP.

Two comments:

 - I tried to figure out why the two InMemoerySessionStore methods are
deprecated and it seems those annotations are there since the class was
added; as this seems to be a bug, and there are no backward
compatibility concerns, I just did a PR to remove those annotations:

https://github.com/apache/kafka/pull/9061


 - about moving the "read" methods from SessionStore to
ReadOnlySessionsStore: while I agree that this make sense, it is
strictly specking backward incompatible if we don't add default
implementations. On the other hand, it seems that the likelihood that
one only implement ReadOnlySessionStore is basically zero, so I am not
sure if its worth to bother?


-Matthias

On 7/4/20 7:02 AM, John Roesler wrote:
> Thanks Jorge,
> 
> This KIP looks good to me!
> 
> -John
> 
> On Fri, Jul 3, 2020, at 03:19, Jorge Esteban Quilcate Otoya wrote:
>> Hi John,
>>
>> Thanks for the feedback.
>>
>> I'd be happy to take the third option and consider moving methods to
>> ReadOnlySessionStore as part of the KIP.
>> Docs is updated to reflect these changes.
>>
>> Cheers,
>> Jorge.
>>
>> On Thu, Jul 2, 2020 at 10:06 PM John Roesler <vv...@apache.org> wrote:
>>
>>> Hey Jorge,
>>>
>>> Thanks for the details. That sounds like a mistake to me on both counts.
>>>
>>> I don’t think you need to worry about those depreciations. If the
>>> interface methods aren’t deprecated, then the methods are not deprecated.
>>> We should remove the annotations, but it doesn’t need to be in the kip.
>>>
>>> I think any query methods should have been in the ReadOnly interface. I
>>> guess it’s up to you whether you want to:
>>> 1. Add the reverse methods next to the existing methods (what you have in
>>> the kip right now)
>>> 2. Partially fix it by adding your new methods to the ReadOnly interface
>>> 3. Fully fix the problem by moving the existing methods as well as your
>>> new ones. Since  SessionStore extends ReadOnlySessionStore, it’s ok just to
>>> move the definitions.
>>>
>>> I’m ok with whatever you prefer.
>>>
>>> Thanks,
>>> John
>>>
>>> On Thu, Jul 2, 2020, at 11:29, Jorge Esteban Quilcate Otoya wrote:
>>>> (sorry for the spam)
>>>>
>>>> Actually `findSessions` are only deprecated on `InMemorySessionStore`,
>>>> which seems strange as RocksDB and interfaces haven't marked these
>>> methods
>>>> as deprecated.
>>>>
>>>> Any hint on how to handle this?
>>>>
>>>> Cheers,
>>>>
>>>> On Thu, Jul 2, 2020 at 4:57 PM Jorge Esteban Quilcate Otoya <
>>>> quilcate.jorge@gmail.com> wrote:
>>>>
>>>>> @John: I can see there are some deprecations in there as well. Will
>>> update
>>>>> the KIP.
>>>>>
>>>>> Thanks,
>>>>> Jorge
>>>>>
>>>>>
>>>>> On Thu, Jul 2, 2020 at 3:29 PM Jorge Esteban Quilcate Otoya <
>>>>> quilcate.jorge@gmail.com> wrote:
>>>>>
>>>>>> Thanks John.
>>>>>>
>>>>>>> it looks like there’s a revision error in which two methods are
>>>>>> proposed for SessionStore, but seem like they should be in
>>>>>> ReadOnlySessionStore. Do I read that right?
>>>>>>
>>>>>> Yes, I've opted to keep the new methods alongside the existing ones.
>>> In
>>>>>> the case of SessionStore, `findSessions` are in `SessionStore`, and
>>> `fetch`
>>>>>> are in `ReadOnlySessionStore`. If it makes more sense, I can move all
>>> of
>>>>>> them to ReadOnlySessionStore.
>>>>>> Let me know what you think.
>>>>>>
>>>>>> Thanks,
>>>>>> Jorge.
>>>>>>
>>>>>> On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org>
>>> wrote:
>>>>>>
>>>>>>> Hi Jorge,
>>>>>>>
>>>>>>> Thanks for the update. I think this is a good plan.
>>>>>>>
>>>>>>> I just took a look at the KIP again, and it looks like there’s a
>>>>>>> revision error in which two methods are proposed for SessionStore,
>>> but seem
>>>>>>> like they should be in ReadOnlySessionStore. Do I read that right?
>>>>>>>
>>>>>>> Otherwise, I’m happy with your proposal.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> John
>>>>>>>
>>>>>>> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
>>>>>>>> Quick update: KIP is updated with latest changes now.
>>>>>>>> Will leave it open this week while working on the PR.
>>>>>>>>
>>>>>>>> Hope to open a new vote thread over the next few days if no
>>> additional
>>>>>>>> feedback is provided.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Jorge.
>>>>>>>>
>>>>>>>> On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
>>>>>>>> quilcate.jorge@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks, John!
>>>>>>>>>
>>>>>>>>> Make sense to reconsider the current approach. I was heading in a
>>>>>>> similar
>>>>>>>>> direction while drafting the implementation. Metered, Caching,
>>> and
>>>>>>> other
>>>>>>>>> layers will also have to get duplicated to build up new methods
>>> in
>>>>>>> `Stores`
>>>>>>>>> factory, and class casting issues would appear on stores created
>>> by
>>>>>>> DSL.
>>>>>>>>>
>>>>>>>>> I will draft a proposal with new methods (move methods from
>>> proposed
>>>>>>>>> interfaces to existing ones) with default implementation in a KIP
>>>>>>> update
>>>>>>>>> and wait for Matthias to chime in to validate this approach.
>>>>>>>>>
>>>>>>>>> Jorge.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Jun 27, 2020 at 4:01 PM John Roesler <
>>> vvcephei@apache.org>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jorge,
>>>>>>>>>>
>>>>>>>>>> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
>>>>>>> releases.
>>>>>>>>>>
>>>>>>>>>> The idea to separate the new methods into "mixin" interfaces
>>> seems
>>>>>>>>>> like a good one, but as we've discovered in KIP-614, it doesn't
>>> work
>>>>>>>>>> out that way in practice. The problem is that the store
>>>>>>> implementations
>>>>>>>>>> are just the base layer that get composed with other layers in
>>>>>>> Streams
>>>>>>>>>> before they can be accessed in the DSL. This is extremely
>>> subtle, so
>>>>>>>>>> I'm going to put everyone to sleep with a detailed explanation:
>>>>>>>>>>
>>>>>>>>>> For example, this is the mechanism by which all KeyValueStore
>>>>>>>>>> implementations get added to Streams:
>>>>>>>>>>
>>> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
>>>>>>>>>> return new MeteredKeyValueStore<>(
>>>>>>>>>>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
>>>>>>>>>>   storeSupplier.metricsScope(),
>>>>>>>>>>   time,
>>>>>>>>>>   keySerde,
>>>>>>>>>>   valueSerde
>>>>>>>>>> );
>>>>>>>>>>
>>>>>>>>>> In the DSL, the store that a processor gets from the context
>>> would
>>>>>>> be
>>>>>>>>>> the result of this composition. So even if the
>>> storeSupplier.get()
>>>>>>> returns
>>>>>>>>>> a store that implements the "reverse" interface, when you try to
>>>>>>> use it
>>>>>>>>>> from a processor like:
>>>>>>>>>> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
>>>>>>>>>> ReadOnlyBackwardWindowStore<K, V> store =
>>>>>>>>>>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
>>>>>>>>>>
>>>>>>>>>> You'd just get a ClassCastException because it's actually a
>>>>>>>>>> MeteredKeyValueStore, which doesn't implement
>>>>>>>>>> ReadOnlyBackwardWindowStore.
>>>>>>>>>>
>>>>>>>>>> The only way to make this work would be to make the Metered,
>>>>>>>>>> Caching, and Logging layers also implement the new interfaces,
>>>>>>>>>> but this effectively forces implementations to also implement
>>>>>>>>>> the interface. Otherwise, the intermediate layers would have to
>>>>>>>>>> cast the store in each method, like this:
>>>>>>>>>> MeteredWindowStore#backwardFetch {
>>>>>>>>>>   ((ReadOnlyBackwardWindowStore<K, V>)
>>> innerStore).backwardFetch(..)
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> And then if the implementation doesn't "opt in" by implementing
>>>>>>>>>> the interface, you'd get a ClassCastException, not when you get
>>> the
>>>>>>>>>> store, but when you try to use it.
>>>>>>>>>>
>>>>>>>>>> The fact that we get ClassCastExceptions no matter which way we
>>>>>>>>>> turn here indicates that we're really not getting any benefit
>>> from
>>>>>>> the
>>>>>>>>>> type system, which makes the extra interfaces seem not worth
>>> all the
>>>>>>>>>> code involved.
>>>>>>>>>>
>>>>>>>>>> Where we landed in KIP-614 is that, unless we want to completely
>>>>>>>>>> revamp the way that StateStores work in the DSL, you might as
>>>>>>>>>> well just add the new methods to the existing interfaces. To
>>> prevent
>>>>>>>>>> compilation errors, we can add default implementations that
>>> throw
>>>>>>>>>> UnsupportedOperationException. If a store doesn't opt in by
>>>>>>>>>> implementing the methods, you'd get an
>>>>>>> UnsupportedOperationException,
>>>>>>>>>> which seems no worse, and maybe better, than the
>>> ClassCastException
>>>>>>>>>> you'd get if we go with the "mixin interface" approach.
>>>>>>>>>>
>>>>>>>>>> A quick note: This entire discussion focuses on the DSL. If
>>> you're
>>>>>>> just
>>>>>>>>>> using the Processor API by directly adding the a custom store
>>> to the
>>>>>>>>>> Topology:
>>>>>>>>>> org.apache.kafka.streams.Topology#addStateStore
>>>>>>>>>> and then retrieving it in the processor via:
>>>>>>>>>>
>>> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
>>>>>>>>>> in
>>>>>>>>>> org.apache.kafka.streams.processor.Processor#init
>>>>>>>>>>
>>>>>>>>>> Then, you can both register and retrieve _any_ StateStore
>>>>>>> implementation.
>>>>>>>>>> There's no need to use KeyValueStore or any other built-in
>>>>>>> interface.
>>>>>>>>>> In other words, KeyValueStore and company are only part of the
>>> DSL,
>>>>>>>>>> not the PAPI. So, discussions about the build-in store
>>> interfaces
>>>>>>> are only
>>>>>>>>>> really relevant in the context of the DSL, Transformers, and
>>>>>>> Materialized.
>>>>>>>>>>
>>>>>>>>>> So, in conclusion, I'd really recommend just adding any new
>>> methods
>>>>>>> to
>>>>>>>>>> the existing store interfaces. We might be able to revamp the
>>> API
>>>>>>> in the
>>>>>>>>>> future to support mixins, but it's a much larger scope of work
>>> than
>>>>>>> this
>>>>>>>>>> KIP.
>>>>>>>>>> A more minor comment is that we don't need to add Deprecated
>>>>>>> variants
>>>>>>>>>> of new methods.
>>>>>>>>>>
>>>>>>>>>> Thanks again, and once again, I'm sorry I tuned out and didn't
>>>>>>> offer this
>>>>>>>>>> feedback before you revised the KIP.
>>>>>>>>>> -John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya
>>> wrote:
>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>
>>>>>>>>>>> I've updated the KIP, applying Matthias' feedback regarding
>>>>>>> interface
>>>>>>>>>>> hierarchy.
>>>>>>>>>>>
>>>>>>>>>>> Also, following the last email, I think we can consider
>>> reverse
>>>>>>>>>> operations
>>>>>>>>>>> on KeyValue range as well, as implementation supports
>>>>>>> lexicographic
>>>>>>>>>> order.
>>>>>>>>>>>
>>>>>>>>>>> I considered different naming between Key-based ranges and
>>>>>>> Time-based
>>>>>>>>>>> ranges, and mitigate confusion when fetching keys and time
>>> ranges
>>>>>>> as
>>>>>>>>>>> WindowStore does:
>>>>>>>>>>>
>>>>>>>>>>> Key-based ranges: reverseRange(), reverseAll()
>>>>>>>>>>> Time-based ranges: backwardFetch()
>>>>>>>>>>>
>>>>>>>>>>> Then, key-based changes apply to KeyValueStore, and time-based
>>>>>>> changes
>>>>>>>>>> to
>>>>>>>>>>> Window and Session stores.
>>>>>>>>>>>
>>>>>>>>>>> Let me know if you have any questions.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Jorge.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya
>>> <
>>>>>>>>>>> quilcate.jorge@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi everyone, sorry for the late reply.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks Matthias for your feedback. I think it makes sense to
>>>>>>>>>> reconsider
>>>>>>>>>>>> the current design based on your input.
>>>>>>>>>>>>
>>>>>>>>>>>> After digging deeper into the current implementation, I'd
>>> like
>>>>>>> to
>>>>>>>>>> bring my
>>>>>>>>>>>> current understanding to be double-checked as it might be
>>>>>>> redefining
>>>>>>>>>> the
>>>>>>>>>>>> KIP's scope:
>>>>>>>>>>>>
>>>>>>>>>>>> 1. There are 2 ranges been exposed by different stores:
>>>>>>>>>>>>
>>>>>>>>>>>> a. Key Range
>>>>>>>>>>>> b. Timestamp Range
>>>>>>>>>>>>
>>>>>>>>>>>> So far, we have discussed covering both.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. Key Range functions do not provide ordering guarantees by
>>>>>>> design:
>>>>>>>>>>>>
>>>>>>>>>>>> ```ReadOnlyKeyValueStore.java
>>>>>>>>>>>>     /**
>>>>>>>>>>>>      * Get an iterator over a given range of keys. This
>>>>>>> iterator must
>>>>>>>>>> be
>>>>>>>>>>>> closed after use.
>>>>>>>>>>>>      * The returned iterator must be safe from {@link
>>>>>>>>>>>> java.util.ConcurrentModificationException}s
>>>>>>>>>>>>      * and must not return null values. No ordering
>>> guarantees
>>>>>>> are
>>>>>>>>>>>> provided.
>>>>>>>>>>>>      * ...
>>>>>>>>>>>>      */
>>>>>>>>>>>>      KeyValueIterator<K, V> range(K from, K to);
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> Therefore, I'd propose removing Key range operations from
>>> the
>>>>>>> scope.
>>>>>>>>>>>>
>>>>>>>>>>>> 3. Timestamp Range operations happen at the SegmentsStore
>>> level
>>>>>>>>>> (internal)
>>>>>>>>>>>> API
>>>>>>>>>>>>
>>>>>>>>>>>> AFAICT, Segments wrappers handle all Timestamp ranges
>>> queries.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd propose extending `Segments#segments(timeFrom, timeTo,
>>>>>>> backwards)`
>>>>>>>>>>>> with a flag for backwards operations.
>>>>>>>>>>>>
>>>>>>>>>>>> As segments returned will be processed backwards, I'm not
>>>>>>> extending
>>>>>>>>>>>> KeyValueStores to query each segment backwards as previous
>>>>>>> point 2.
>>>>>>>>>>>>
>>>>>>>>>>>> 4. Extend WindowStores implementations with a new
>>>>>>>>>>>> WindowBackwardStore/ReadOnlyBackwardStore:
>>>>>>>>>>>>
>>>>>>>>>>>> ```java
>>>>>>>>>>>> public interface ReadOnlyWindowBackwardStore<K, V> {
>>>>>>>>>>>>     WindowStoreIterator<V> backwardFetch(K key, Instant
>>> from,
>>>>>>> Instant
>>>>>>>>>> to)
>>>>>>>>>>>> throws IllegalArgumentException;
>>>>>>>>>>>>
>>>>>>>>>>>>     KeyValueIterator<Windowed<K>, V> backwardFetch(K from,
>>> K to,
>>>>>>>>>> Instant
>>>>>>>>>>>> fromTime, Instant toTime)
>>>>>>>>>>>>         throws IllegalArgumentException;
>>>>>>>>>>>>
>>>>>>>>>>>>     KeyValueIterator<Windowed<K>, V>
>>> backwardFetchAll(Instant
>>>>>>> from,
>>>>>>>>>>>> Instant to) throws IllegalArgumentException;
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> 5. SessionStore is a bit different as it has fetch/find
>>> sessions
>>>>>>>>>> spread
>>>>>>>>>>>> between SessionStore and ReadOnlySessionStore.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd propose a new interface `SessionBackwardStore` to expose
>>>>>>> backward
>>>>>>>>>> find
>>>>>>>>>>>> operations:
>>>>>>>>>>>>
>>>>>>>>>>>> ```java
>>>>>>>>>>>> public interface SessionBackwardStore<K, AGG> {
>>>>>>>>>>>>     KeyValueIterator<Windowed<K>, AGG>
>>>>>>> backwardFindSessions(final K
>>>>>>>>>> key,
>>>>>>>>>>>> final long earliestSessionEndTime, final long
>>>>>>> latestSessionStartTime);
>>>>>>>>>>>>
>>>>>>>>>>>>     KeyValueIterator<Windowed<K>, AGG>
>>>>>>> backwardFindSessions(final K
>>>>>>>>>>>> keyFrom, final K keyTo, final long earliestSessionEndTime,
>>>>>>> final long
>>>>>>>>>>>> latestSessionStartTime);
>>>>>>>>>>>> }
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> If this understanding is correct I'll proceed to update the
>>> KIP
>>>>>>> based
>>>>>>>>>> on
>>>>>>>>>>>> this.
>>>>>>>>>>>>
>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Jorge.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <
>>>>>>> mjsax@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry that I am late to the game. I am not 100% convinced
>>>>>>> about the
>>>>>>>>>>>>> current proposal. Using a new config as feature flag seems
>>> to
>>>>>>> be
>>>>>>>>>> rather
>>>>>>>>>>>>> "nasty" to me, and flipping from/to is a little bit too
>>> fancy
>>>>>>> for my
>>>>>>>>>>>>> personal taste.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree, that the original proposal using a "ReadDirection"
>>>>>>> enum is
>>>>>>>>>> not
>>>>>>>>>>>>> ideal either.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thus, I would like to put out a new idea: We could add a
>>> new
>>>>>>>>>> interface
>>>>>>>>>>>>> that offers new methods that return revers iterators.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The KIP already proposes to add `reverseAll()` and it seems
>>>>>>> backward
>>>>>>>>>>>>> incompatible to just add this method to
>>>>>>> `ReadOnlyKeyValueStore` and
>>>>>>>>>>>>> `ReadOnlyWindowStore`. I don't think we could provide a
>>> useful
>>>>>>>>>> default
>>>>>>>>>>>>> implementation for custom stores and thus either break
>>>>>>> compatibility
>>>>>>>>>> or
>>>>>>>>>>>>> need add a default that just throws an exception. Neither
>>>>>>> seems to
>>>>>>>>>> be a
>>>>>>>>>>>>> good option.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Using a new interface avoid this issue and allows users
>>>>>>> implementing
>>>>>>>>>>>>> custom stores to opt-in by adding the interface to their
>>>>>>> stores.
>>>>>>>>>>>>> Furthermore, we don't need any config. In the end, we
>>>>>>> encapsulte the
>>>>>>>>>>>>> change into the store, and our runtime is agnostic to it
>>> (as it
>>>>>>>>>> should
>>>>>>>>>>>>> be).
>>>>>>>>>>>>>
>>>>>>>>>>>>> The hierarchy becomes a little complex (but uses would not
>>>>>>> really see
>>>>>>>>>>>>> the complexity):
>>>>>>>>>>>>>
>>>>>>>>>>>>> // exsiting
>>>>>>>>>>>>> ReadOnlyKeyValueStore
>>>>>>>>>>>>> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> // helper interface; users don't care
>>>>>>>>>>>>> // need similar ones for other stores
>>>>>>>>>>>>> ReverseReadOnlyKeyValueStore {
>>>>>>>>>>>>>     KeyValueIterator<K, V> reverseRange(K from, K to);
>>>>>>>>>>>>>     KeyValueIterator<K, V> reverseAll();
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> // two new user facing interfaces for kv-store
>>>>>>>>>>>>> // need similar ones for other stores
>>>>>>>>>>>>> ReadOnlyKeyValueStoreWithReverseIterators extends
>>>>>>>>>> ReadOnlyKeyValueStore,
>>>>>>>>>>>>> ReverseReadOnlyKeyValueStore
>>>>>>>>>>>>>
>>>>>>>>>>>>> KeyValueStoreWithReverseIterators extends KeyValueStore,
>>>>>>>>>>>>> ReverseReadOnlyKeyValueStore
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> // updated (also internal)
>>>>>>>>>>>>> // also need to update other built-in stores
>>>>>>>>>>>>> RocksDB implements KeyValueStoreWithReverseIterators,
>>>>>>>>>> BulkLoadingStore
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the end, user would only care about the two (for
>>> kv-case)
>>>>>>> new
>>>>>>>>>>>>> interface that offer revers iterator (read only and
>>> regular)
>>>>>>> and can
>>>>>>>>>>>>> cast stores accordingly in their Processors/Transformers or
>>>>>>> via IQ.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Btw: if we add revers iterator for KeyValue and Window
>>> store,
>>>>>>> should
>>>>>>>>>> we
>>>>>>>>>>>>> do the same for Session store?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This might be more code to write, but I believe it
>>> provides the
>>>>>>>>>> better
>>>>>>>>>>>>> user experience. Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 5/26/20 8:47 PM, John Roesler wrote:
>>>>>>>>>>>>>> Sorry for my silence, Jorge,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've just taken another look, and I'm personally happy
>>> with
>>>>>>> the
>>>>>>>>>> KIP.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate
>>> Otoya
>>>>>>> wrote:
>>>>>>>>>>>>>>> If no additional comments, I will proceed to start the a
>>>>>>> vote
>>>>>>>>>> thread.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks a lot for your feedback!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate
>>>>>>> Otoya <
>>>>>>>>>>>>>>> quilcate.jorge@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks Sophie. I like the `reverseAll()` idea.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I updated the KIP with your feedback.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
>>>>>>>>>>>>> sophie@confluent.io>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hm, the case of `all()` does seem to present a
>>> dilemma in
>>>>>>> the
>>>>>>>>>> case of
>>>>>>>>>>>>>>>>> variable-length keys.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In the case of fixed-length keys, you can just compute
>>>>>>> the keys
>>>>>>>>>> that
>>>>>>>>>>>>>>>>> correspond
>>>>>>>>>>>>>>>>> to the maximum and minimum serialized bytes, then
>>> perform
>>>>>>> a
>>>>>>>>>> `range()`
>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>> instead of an `all()`. If your keys don't have a
>>>>>>> well-defined
>>>>>>>>>>>>> ordering
>>>>>>>>>>>>>>>>> such
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> you can't determine the MAX_KEY, then you probably
>>> don't
>>>>>>> care
>>>>>>>>>> about
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> iterator order anyway.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  But with variable-length keys, there is no MAX_KEY.
>>> If
>>>>>>> all your
>>>>>>>>>>>>> keys were
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact
>>>>>>> the only
>>>>>>>>>> way
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> figure out the
>>>>>>>>>>>>>>>>> maximum key in the store is by using `all()` -- and
>>>>>>> without a
>>>>>>>>>> reverse
>>>>>>>>>>>>>>>>> iterator, you're
>>>>>>>>>>>>>>>>> doomed to iterate through every single key just to
>>> answer
>>>>>>> that
>>>>>>>>>> simple
>>>>>>>>>>>>>>>>> question.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> That said, I still think determining the iterator
>>> order
>>>>>>> based
>>>>>>>>>> on the
>>>>>>>>>>>>>>>>> to/from bytes
>>>>>>>>>>>>>>>>> makes a lot of intuitive sense and gives the API a
>>> nice
>>>>>>>>>> symmetry.
>>>>>>>>>>>>> What if
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> solved the `all()` problem by just giving `all()` a
>>>>>>> reverse
>>>>>>>>>> form to
>>>>>>>>>>>>>>>>> complement it?
>>>>>>>>>>>>>>>>> Ie we would have `all()` and `reverseAll()`, or
>>> something
>>>>>>> to
>>>>>>>>>> that
>>>>>>>>>>>>> effect.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
>>>>>>> Otoya <
>>>>>>>>>>>>>>>>> quilcate.jorge@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks John.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Agree. I like the first approach as well, with
>>>>>>> StreamsConfig
>>>>>>>>>> flag
>>>>>>>>>>>>>>>>> passing
>>>>>>>>>>>>>>>>>> by via ProcessorContext.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Another positive effect with "reverse parameters" is
>>>>>>> that in
>>>>>>>>>> the
>>>>>>>>>>>>> case of
>>>>>>>>>>>>>>>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can
>>>>>>> decide
>>>>>>>>>> _which_
>>>>>>>>>>>>> pair
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> flip, whether with `ReadDirection` enum it apply to
>>> both.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The only issue I've found while reviewing the KIP is
>>> that
>>>>>>>>>> `all()`
>>>>>>>>>>>>> won't
>>>>>>>>>>>>>>>>> fit
>>>>>>>>>>>>>>>>>> within this approach.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We could remove it from the KIP and argue that for
>>>>>>> WindowStore,
>>>>>>>>>>>>>>>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all
>>> in
>>>>>>> reverse
>>>>>>>>>>>>> order,
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> for KeyValueStore, no ordering guarantees are
>>> provided.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If there is consensus with this changes, I will go
>>> and
>>>>>>> update
>>>>>>>>>> the
>>>>>>>>>>>>> KIP.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
>>>>>>>>>> vvcephei@apache.org>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Jorge,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for that idea. I agree, a feature flag would
>>>>>>> protect
>>>>>>>>>> anyone
>>>>>>>>>>>>>>>>>>> who may be depending on the current behavior.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It seems better to locate the feature flag in the
>>>>>>>>>> initialization
>>>>>>>>>>>>>>>>> logic of
>>>>>>>>>>>>>>>>>>> the store, rather than have a method on the "live"
>>>>>>> store that
>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>>>> its behavior on the fly.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It seems like there are two options here, one is to
>>> add
>>>>>>> a new
>>>>>>>>>>>>> config:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
>>>>>>>>>>>>>>>>>>>   "enable.backwards.iteration
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Or we can add a feature flag in Materialized, like
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Materialized.enableBackwardsIteration()
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think I'd personally lean toward the config, for
>>> the
>>>>>>>>>> following
>>>>>>>>>>>>>>>>> reason.
>>>>>>>>>>>>>>>>>>> The concern that Sophie raised is that someone's
>>>>>>> program may
>>>>>>>>>> depend
>>>>>>>>>>>>>>>>>>> on the existing contract of getting an empty
>>> iterator.
>>>>>>> We
>>>>>>>>>> don't
>>>>>>>>>>>>> want
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> switch behavior when they aren't expecting it, so we
>>>>>>> provide
>>>>>>>>>> them a
>>>>>>>>>>>>>>>>>>> config to assert that they _are_ expecting the new
>>>>>>> behavior,
>>>>>>>>>> which
>>>>>>>>>>>>>>>>>>> means they take responsibility for updating their
>>> code
>>>>>>> to
>>>>>>>>>> expect
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>> behavior.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> There doesn't seem to be a reason to offer a choice
>>> of
>>>>>>>>>> behaviors
>>>>>>>>>>>>> on a
>>>>>>>>>>>>>>>>>>> per-query, or per-store basis. We just want people
>>> to
>>>>>>> be not
>>>>>>>>>>>>> surprised
>>>>>>>>>>>>>>>>>>> by this change in general.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate
>>> wrote:
>>>>>>>>>>>>>>>>>>>> Thank you both for the great feedback.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I like the "fancy" proposal :), and how it removes
>>> the
>>>>>>> need
>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>> additional API methods. And with a feature flag on
>>>>>>>>>> `StateStore`,
>>>>>>>>>>>>>>>>>>>> disabled by default, should no break current users.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> The only side-effect I can think of is that: by
>>> moving
>>>>>>> the
>>>>>>>>>> flag
>>>>>>>>>>>>>>>>>> upwards,
>>>>>>>>>>>>>>>>>>>> all later operations become affected; which might
>>> be
>>>>>>> ok for
>>>>>>>>>> most
>>>>>>>>>>>>>>>>> (all?)
>>>>>>>>>>>>>>>>>>>> cases. I can't think of an scenario where this
>>> would
>>>>>>> be an
>>>>>>>>>> issue,
>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>> want to point this out.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If moving to this approach, I'd like to check if I
>>> got
>>>>>>> this
>>>>>>>>>> right
>>>>>>>>>>>>>>>>>> before
>>>>>>>>>>>>>>>>>>>> updating the KIP:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> - only `StateStore` will change by having a new
>>> method:
>>>>>>>>>>>>>>>>>>>> `backwardIteration()`, `false` by default to keep
>>>>>>> things
>>>>>>>>>>>>> compatible.
>>>>>>>>>>>>>>>>>>>> - then all `*Stores` will have to update their
>>>>>>> implementation
>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>>> this flag.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>>>>>>>>>>>>>>>>>>>>>> There's no possibility that someone could be
>>> relying
>>>>>>>>>>>>>>>>>>>>>> on iterating over that range in increasing order,
>>>>>>> because
>>>>>>>>>> that's
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>> happens. However, they could indeed be relying on
>>>>>>> getting
>>>>>>>>>> an
>>>>>>>>>>>>>>>>> empty
>>>>>>>>>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I just meant that they might be relying on the
>>>>>>> assumption
>>>>>>>>>> that
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> range
>>>>>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>>>> will never return results with decreasing keys.
>>> The
>>>>>>> empty
>>>>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>>>> wouldn't
>>>>>>>>>>>>>>>>>>>>> break that contract, but of course a surprise
>>> reverse
>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>> would.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> FWIW I actually am in favor of automatically
>>>>>>> converting to a
>>>>>>>>>>>>>>>>> reverse
>>>>>>>>>>>>>>>>>>>>> iterator,
>>>>>>>>>>>>>>>>>>>>> I just thought we should consider whether this
>>> should
>>>>>>> be
>>>>>>>>>> off by
>>>>>>>>>>>>>>>>>>> default or
>>>>>>>>>>>>>>>>>>>>> even possible to disable at all.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
>>>>>>>>>>>>> vvcephei@apache.org
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks for the response, Sophie,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I wholeheartedly agree we should take as much
>>> into
>>>>>>> account
>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>>>>>>> up front, rather than regretting our decisions
>>>>>>> later. I
>>>>>>>>>> actually
>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>> share
>>>>>>>>>>>>>>>>>>>>>> your vague sense of worry, which was what led me
>>> to
>>>>>>> say
>>>>>>>>>>>>> initially
>>>>>>>>>>>>>>>>>>> that I
>>>>>>>>>>>>>>>>>>>>>> thought my counterproposal might be "too fancy".
>>>>>>>>>> Sometimes, it's
>>>>>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>>>>>> to be explicit instead of "elegant", if we think
>>> more
>>>>>>>>>> people
>>>>>>>>>>>>>>>>> will be
>>>>>>>>>>>>>>>>>>>>>> confused
>>>>>>>>>>>>>>>>>>>>>> than not.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I really don't think that there's any danger of
>>>>>>> "relying
>>>>>>>>>> on a
>>>>>>>>>>>>>>>>> bug"
>>>>>>>>>>>>>>>>>>> here,
>>>>>>>>>>>>>>>>>>>>>> although
>>>>>>>>>>>>>>>>>>>>>> people certainly could be relying on current
>>>>>>> behavior. One
>>>>>>>>>> thing
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> clear
>>>>>>>>>>>>>>>>>>>>>> about (which I just left a more detailed comment
>>> in
>>>>>>>>>> KAFKA-8159
>>>>>>>>>>>>>>>>>> about)
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> when we say something like key1 > key2, this
>>>>>>> ordering is
>>>>>>>>>> defined
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> serde's output and nothing else.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Currently, thanks to your fix in
>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/6521
>>>>>>>>>>>>>>>>>>>>>> ,
>>>>>>>>>>>>>>>>>>>>>> the store contract is that for range scans, if
>>> from
>>>>>>>> to,
>>>>>>>>>> then
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>>>>>> must
>>>>>>>>>>>>>>>>>>>>>> return an empty iterator. There's no possibility
>>> that
>>>>>>>>>> someone
>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> relying
>>>>>>>>>>>>>>>>>>>>>> on iterating over that range in increasing order,
>>>>>>> because
>>>>>>>>>> that's
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>>>>>>>>> happens. However, they could indeed be relying on
>>>>>>> getting
>>>>>>>>>> an
>>>>>>>>>>>>>>>>> empty
>>>>>>>>>>>>>>>>>>>>>> iterator.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> My counterproposal was to actually change this
>>>>>>> contract to
>>>>>>>>>> say
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>>>>>>>>> must return an iterator over the keys in that
>>> range,
>>>>>>> but
>>>>>>>>>> in the
>>>>>>>>>>>>>>>>>>> reverse
>>>>>>>>>>>>>>>>>>>>>> order.
>>>>>>>>>>>>>>>>>>>>>> So, in addition to considering whether this idea
>>> is
>>>>>>> "too
>>>>>>>>>> fancy"
>>>>>>>>>>>>>>>>> (aka
>>>>>>>>>>>>>>>>>>>>>> confusing),
>>>>>>>>>>>>>>>>>>>>>> we should also consider the likelihood of
>>> breaking an
>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>> program with
>>>>>>>>>>>>>>>>>>>>>> this behavior/contract change.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> To echo your clarification, I'm also not
>>> advocating
>>>>>>>>>> strongly in
>>>>>>>>>>>>>>>>>> favor
>>>>>>>>>>>>>>>>>>> of my
>>>>>>>>>>>>>>>>>>>>>> proposal. I just wanted to present it for
>>>>>>> consideration
>>>>>>>>>>>>> alongside
>>>>>>>>>>>>>>>>>>> Jorge's
>>>>>>>>>>>>>>>>>>>>>> original one.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks for raising these very good points,
>>>>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie
>>> Blee-Goldman
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Rather than working around it, I think we
>>> should
>>>>>>> just
>>>>>>>>>> fix it
>>>>>>>>>>>>>>>>>>>>>>> Now *that's* a "fancy" idea :P
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> That was my primary concern, although I do have
>>> a
>>>>>>> vague
>>>>>>>>>> sense
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> worry
>>>>>>>>>>>>>>>>>>>>>>> that we might be allowing users to get into
>>> trouble
>>>>>>>>>> without
>>>>>>>>>>>>>>>>>>> realizing it.
>>>>>>>>>>>>>>>>>>>>>>> For example if their custom serdes suffer a
>>> similar
>>>>>>> bug
>>>>>>>>>> as the
>>>>>>>>>>>>>>>>>> above,
>>>>>>>>>>>>>>>>>>>>>>> and/or
>>>>>>>>>>>>>>>>>>>>>>> they rely on getting results in increasing order
>>>>>>> (of the
>>>>>>>>>> keys)
>>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>> to < from. Maybe they're relying on the fact
>>> that
>>>>>>> the
>>>>>>>>>> range
>>>>>>>>>>>>>>>>> query
>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>>> nothing in that case.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Not sure if that qualifies as relying on a bug
>>> or
>>>>>>> not,
>>>>>>>>>> but in
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> past
>>>>>>>>>>>>>>>>>>>>>>> we've
>>>>>>>>>>>>>>>>>>>>>>> taken the stance that we should not break
>>>>>>> compatibility
>>>>>>>>>> even if
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>> was relying on bugs or unintentional behavior.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Just to clarify I'm not advocating strongly
>>> against
>>>>>>> this
>>>>>>>>>>>>>>>>> proposal,
>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>> laying
>>>>>>>>>>>>>>>>>>>>>>> out some considerations we should take into
>>>>>>> account. At
>>>>>>>>>> the end
>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> day
>>>>>>>>>>>>>>>>>>>>>>> we should do what's right rather than maintain
>>>>>>>>>> compatibility
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> existing
>>>>>>>>>>>>>>>>>>>>>>> bugs, but sometimes there's a reasonable middle
>>>>>>> ground.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
>>>>>>>>>>>>>>>>> vvcephei@apache.org>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Thanks Sophie,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Woah, that’s a nasty bug. Rather than working
>>>>>>> around it,
>>>>>>>>>> I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>>> just fix it. I’ll leave some comments on the
>>> Jira.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> It doesn’t seem like it should be this KIP’s
>>>>>>> concern
>>>>>>>>>> that some
>>>>>>>>>>>>>>>>>>> serdes
>>>>>>>>>>>>>>>>>>>>>>>> might be incorrectly written.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Were there other practical concerns that you
>>> had
>>>>>>> in mind?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie
>>> Blee-Goldman
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>> I like this "fancy idea" to just flip the
>>> to/from
>>>>>>> bytes
>>>>>>>>>> but I
>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>>>>>> are some practical limitations to implementing
>>>>>>> this. In
>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>>>>>> I'm thinking about this issue
>>>>>>>>>>>>>>>>>>>>>>>>> <
>>> https://issues.apache.org/jira/browse/KAFKA-8159
>>>>>>>>
>>>>>>>>>> with the
>>>>>>>>>>>>>>>>>>> built-in
>>>>>>>>>>>>>>>>>>>>>>>> signed
>>>>>>>>>>>>>>>>>>>>>>>>> number serdes.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This trick would actually fix the problem for
>>>>>>>>>>>>>>>>> negative-negative
>>>>>>>>>>>>>>>>>>>>>> queries
>>>>>>>>>>>>>>>>>>>>>>>>> (ie where to & from are negative) but would
>>> cause
>>>>>>>>>>>>> undetectable
>>>>>>>>>>>>>>>>>>>>>>>>> incorrect results for negative-positive
>>> queries.
>>>>>>> For
>>>>>>>>>> example,
>>>>>>>>>>>>>>>>> say
>>>>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>>>>>>>>> call #range with from = -1 and to = 1, using
>>> the
>>>>>>> Short
>>>>>>>>>>>>> serdes.
>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>>>> serialized bytes for that are
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> from = 1111111111111111
>>>>>>>>>>>>>>>>>>>>>>>>> to = 0000000000000001
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> so we would end up flipping those and
>>> iterating
>>>>>>> over
>>>>>>>>>> all keys
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>>>>>> 0000000000000001 to 1111111111111111.
>>> Iterating in
>>>>>>>>>>>>>>>>>> lexicographical
>>>>>>>>>>>>>>>>>>>>>>>>> order means we would iterate over every key in
>>>>>>> the space
>>>>>>>>>>>>>>>>> *except*
>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>> 0, but 0 is actually the *only* other key we
>>>>>>> meant to be
>>>>>>>>>>>>>>>>> included
>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> range query.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Currently we just log a warning and return an
>>>>>>> empty
>>>>>>>>>> iterator
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>>>>>>>>> to < from, which is obviously also incorrect
>>> but
>>>>>>> feels
>>>>>>>>>>>>>>>>> slightly
>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>> palatable. If we start automatically
>>> converting to
>>>>>>>>>> reverse
>>>>>>>>>>>>>>>>>> queries
>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>> can't even log a warning in this case unless
>>> we
>>>>>>> wanted
>>>>>>>>>> to log
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> warning
>>>>>>>>>>>>>>>>>>>>>>>>> every time, which would be weird to do for a
>>> valid
>>>>>>>>>> usage of a
>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> All that said, I still like the idea overall.
>>> Off
>>>>>>> the
>>>>>>>>>> top of
>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>> head
>>>>>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>>>>>>>> guess
>>>>>>>>>>>>>>>>>>>>>>>>> we could add a store config to enable/disable
>>>>>>> automatic
>>>>>>>>>>>>>>>>> reverse
>>>>>>>>>>>>>>>>>>>>>>>> iteration,
>>>>>>>>>>>>>>>>>>>>>>>>> which is off by default?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP! This will be a nice
>>> addition
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Sophie
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
>>>>>>>>>>>>>>>>>> vvcephei@apache.org>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi there Jorge,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I think this feature sounds very reasonable.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but
>>>>>>> what do
>>>>>>>>>> you
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>>>>>> about avoiding the enum by instead allowing
>>>>>>> people to
>>>>>>>>>> flip
>>>>>>>>>>>>>>>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading
>>>>>>> from "A"
>>>>>>>>>> to "Z"
>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would
>>> be a
>>>>>>>>>> backward
>>>>>>>>>>>>>>>>> one?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge
>>> Quilcate
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to start the discussion for
>>>>>>> KIP-617:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>>>>>>>>>> Jorge.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Attachments:
>>>>>>>>>>>>>>>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Attachments:
>>>>>>>>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>


Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by John Roesler <vv...@apache.org>.
Thanks Jorge,

This KIP looks good to me!

-John

On Fri, Jul 3, 2020, at 03:19, Jorge Esteban Quilcate Otoya wrote:
> Hi John,
> 
> Thanks for the feedback.
> 
> I'd be happy to take the third option and consider moving methods to
> ReadOnlySessionStore as part of the KIP.
> Docs is updated to reflect these changes.
> 
> Cheers,
> Jorge.
> 
> On Thu, Jul 2, 2020 at 10:06 PM John Roesler <vv...@apache.org> wrote:
> 
> > Hey Jorge,
> >
> > Thanks for the details. That sounds like a mistake to me on both counts.
> >
> > I don’t think you need to worry about those depreciations. If the
> > interface methods aren’t deprecated, then the methods are not deprecated.
> > We should remove the annotations, but it doesn’t need to be in the kip.
> >
> > I think any query methods should have been in the ReadOnly interface. I
> > guess it’s up to you whether you want to:
> > 1. Add the reverse methods next to the existing methods (what you have in
> > the kip right now)
> > 2. Partially fix it by adding your new methods to the ReadOnly interface
> > 3. Fully fix the problem by moving the existing methods as well as your
> > new ones. Since  SessionStore extends ReadOnlySessionStore, it’s ok just to
> > move the definitions.
> >
> > I’m ok with whatever you prefer.
> >
> > Thanks,
> > John
> >
> > On Thu, Jul 2, 2020, at 11:29, Jorge Esteban Quilcate Otoya wrote:
> > > (sorry for the spam)
> > >
> > > Actually `findSessions` are only deprecated on `InMemorySessionStore`,
> > > which seems strange as RocksDB and interfaces haven't marked these
> > methods
> > > as deprecated.
> > >
> > > Any hint on how to handle this?
> > >
> > > Cheers,
> > >
> > > On Thu, Jul 2, 2020 at 4:57 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jorge@gmail.com> wrote:
> > >
> > > > @John: I can see there are some deprecations in there as well. Will
> > update
> > > > the KIP.
> > > >
> > > > Thanks,
> > > > Jorge
> > > >
> > > >
> > > > On Thu, Jul 2, 2020 at 3:29 PM Jorge Esteban Quilcate Otoya <
> > > > quilcate.jorge@gmail.com> wrote:
> > > >
> > > >> Thanks John.
> > > >>
> > > >> > it looks like there’s a revision error in which two methods are
> > > >> proposed for SessionStore, but seem like they should be in
> > > >> ReadOnlySessionStore. Do I read that right?
> > > >>
> > > >> Yes, I've opted to keep the new methods alongside the existing ones.
> > In
> > > >> the case of SessionStore, `findSessions` are in `SessionStore`, and
> > `fetch`
> > > >> are in `ReadOnlySessionStore`. If it makes more sense, I can move all
> > of
> > > >> them to ReadOnlySessionStore.
> > > >> Let me know what you think.
> > > >>
> > > >> Thanks,
> > > >> Jorge.
> > > >>
> > > >> On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org>
> > wrote:
> > > >>
> > > >>> Hi Jorge,
> > > >>>
> > > >>> Thanks for the update. I think this is a good plan.
> > > >>>
> > > >>> I just took a look at the KIP again, and it looks like there’s a
> > > >>> revision error in which two methods are proposed for SessionStore,
> > but seem
> > > >>> like they should be in ReadOnlySessionStore. Do I read that right?
> > > >>>
> > > >>> Otherwise, I’m happy with your proposal.
> > > >>>
> > > >>> Thanks,
> > > >>> John
> > > >>>
> > > >>> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
> > > >>> > Quick update: KIP is updated with latest changes now.
> > > >>> > Will leave it open this week while working on the PR.
> > > >>> >
> > > >>> > Hope to open a new vote thread over the next few days if no
> > additional
> > > >>> > feedback is provided.
> > > >>> >
> > > >>> > Cheers,
> > > >>> > Jorge.
> > > >>> >
> > > >>> > On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
> > > >>> > quilcate.jorge@gmail.com> wrote:
> > > >>> >
> > > >>> > > Thanks, John!
> > > >>> > >
> > > >>> > > Make sense to reconsider the current approach. I was heading in a
> > > >>> similar
> > > >>> > > direction while drafting the implementation. Metered, Caching,
> > and
> > > >>> other
> > > >>> > > layers will also have to get duplicated to build up new methods
> > in
> > > >>> `Stores`
> > > >>> > > factory, and class casting issues would appear on stores created
> > by
> > > >>> DSL.
> > > >>> > >
> > > >>> > > I will draft a proposal with new methods (move methods from
> > proposed
> > > >>> > > interfaces to existing ones) with default implementation in a KIP
> > > >>> update
> > > >>> > > and wait for Matthias to chime in to validate this approach.
> > > >>> > >
> > > >>> > > Jorge.
> > > >>> > >
> > > >>> > >
> > > >>> > > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <
> > vvcephei@apache.org>
> > > >>> wrote:
> > > >>> > >
> > > >>> > >> Hi Jorge,
> > > >>> > >>
> > > >>> > >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
> > > >>> releases.
> > > >>> > >>
> > > >>> > >> The idea to separate the new methods into "mixin" interfaces
> > seems
> > > >>> > >> like a good one, but as we've discovered in KIP-614, it doesn't
> > work
> > > >>> > >> out that way in practice. The problem is that the store
> > > >>> implementations
> > > >>> > >> are just the base layer that get composed with other layers in
> > > >>> Streams
> > > >>> > >> before they can be accessed in the DSL. This is extremely
> > subtle, so
> > > >>> > >> I'm going to put everyone to sleep with a detailed explanation:
> > > >>> > >>
> > > >>> > >> For example, this is the mechanism by which all KeyValueStore
> > > >>> > >> implementations get added to Streams:
> > > >>> > >>
> > org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
> > > >>> > >> return new MeteredKeyValueStore<>(
> > > >>> > >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
> > > >>> > >>   storeSupplier.metricsScope(),
> > > >>> > >>   time,
> > > >>> > >>   keySerde,
> > > >>> > >>   valueSerde
> > > >>> > >> );
> > > >>> > >>
> > > >>> > >> In the DSL, the store that a processor gets from the context
> > would
> > > >>> be
> > > >>> > >> the result of this composition. So even if the
> > storeSupplier.get()
> > > >>> returns
> > > >>> > >> a store that implements the "reverse" interface, when you try to
> > > >>> use it
> > > >>> > >> from a processor like:
> > > >>> > >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> > > >>> > >> ReadOnlyBackwardWindowStore<K, V> store =
> > > >>> > >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
> > > >>> > >>
> > > >>> > >> You'd just get a ClassCastException because it's actually a
> > > >>> > >> MeteredKeyValueStore, which doesn't implement
> > > >>> > >> ReadOnlyBackwardWindowStore.
> > > >>> > >>
> > > >>> > >> The only way to make this work would be to make the Metered,
> > > >>> > >> Caching, and Logging layers also implement the new interfaces,
> > > >>> > >> but this effectively forces implementations to also implement
> > > >>> > >> the interface. Otherwise, the intermediate layers would have to
> > > >>> > >> cast the store in each method, like this:
> > > >>> > >> MeteredWindowStore#backwardFetch {
> > > >>> > >>   ((ReadOnlyBackwardWindowStore<K, V>)
> > innerStore).backwardFetch(..)
> > > >>> > >> }
> > > >>> > >>
> > > >>> > >> And then if the implementation doesn't "opt in" by implementing
> > > >>> > >> the interface, you'd get a ClassCastException, not when you get
> > the
> > > >>> > >> store, but when you try to use it.
> > > >>> > >>
> > > >>> > >> The fact that we get ClassCastExceptions no matter which way we
> > > >>> > >> turn here indicates that we're really not getting any benefit
> > from
> > > >>> the
> > > >>> > >> type system, which makes the extra interfaces seem not worth
> > all the
> > > >>> > >> code involved.
> > > >>> > >>
> > > >>> > >> Where we landed in KIP-614 is that, unless we want to completely
> > > >>> > >> revamp the way that StateStores work in the DSL, you might as
> > > >>> > >> well just add the new methods to the existing interfaces. To
> > prevent
> > > >>> > >> compilation errors, we can add default implementations that
> > throw
> > > >>> > >> UnsupportedOperationException. If a store doesn't opt in by
> > > >>> > >> implementing the methods, you'd get an
> > > >>> UnsupportedOperationException,
> > > >>> > >> which seems no worse, and maybe better, than the
> > ClassCastException
> > > >>> > >> you'd get if we go with the "mixin interface" approach.
> > > >>> > >>
> > > >>> > >> A quick note: This entire discussion focuses on the DSL. If
> > you're
> > > >>> just
> > > >>> > >> using the Processor API by directly adding the a custom store
> > to the
> > > >>> > >> Topology:
> > > >>> > >> org.apache.kafka.streams.Topology#addStateStore
> > > >>> > >> and then retrieving it in the processor via:
> > > >>> > >>
> > org.apache.kafka.streams.processor.ProcessorContext#getStateStore
> > > >>> > >> in
> > > >>> > >> org.apache.kafka.streams.processor.Processor#init
> > > >>> > >>
> > > >>> > >> Then, you can both register and retrieve _any_ StateStore
> > > >>> implementation.
> > > >>> > >> There's no need to use KeyValueStore or any other built-in
> > > >>> interface.
> > > >>> > >> In other words, KeyValueStore and company are only part of the
> > DSL,
> > > >>> > >> not the PAPI. So, discussions about the build-in store
> > interfaces
> > > >>> are only
> > > >>> > >> really relevant in the context of the DSL, Transformers, and
> > > >>> Materialized.
> > > >>> > >>
> > > >>> > >> So, in conclusion, I'd really recommend just adding any new
> > methods
> > > >>> to
> > > >>> > >> the existing store interfaces. We might be able to revamp the
> > API
> > > >>> in the
> > > >>> > >> future to support mixins, but it's a much larger scope of work
> > than
> > > >>> this
> > > >>> > >> KIP.
> > > >>> > >> A more minor comment is that we don't need to add Deprecated
> > > >>> variants
> > > >>> > >> of new methods.
> > > >>> > >>
> > > >>> > >> Thanks again, and once again, I'm sorry I tuned out and didn't
> > > >>> offer this
> > > >>> > >> feedback before you revised the KIP.
> > > >>> > >> -John
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >>
> > > >>> > >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya
> > wrote:
> > > >>> > >> > Hi everyone,
> > > >>> > >> >
> > > >>> > >> > I've updated the KIP, applying Matthias' feedback regarding
> > > >>> interface
> > > >>> > >> > hierarchy.
> > > >>> > >> >
> > > >>> > >> > Also, following the last email, I think we can consider
> > reverse
> > > >>> > >> operations
> > > >>> > >> > on KeyValue range as well, as implementation supports
> > > >>> lexicographic
> > > >>> > >> order.
> > > >>> > >> >
> > > >>> > >> > I considered different naming between Key-based ranges and
> > > >>> Time-based
> > > >>> > >> > ranges, and mitigate confusion when fetching keys and time
> > ranges
> > > >>> as
> > > >>> > >> > WindowStore does:
> > > >>> > >> >
> > > >>> > >> > Key-based ranges: reverseRange(), reverseAll()
> > > >>> > >> > Time-based ranges: backwardFetch()
> > > >>> > >> >
> > > >>> > >> > Then, key-based changes apply to KeyValueStore, and time-based
> > > >>> changes
> > > >>> > >> to
> > > >>> > >> > Window and Session stores.
> > > >>> > >> >
> > > >>> > >> > Let me know if you have any questions.
> > > >>> > >> >
> > > >>> > >> > Thanks,
> > > >>> > >> > Jorge.
> > > >>> > >> >
> > > >>> > >> >
> > > >>> > >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya
> > <
> > > >>> > >> > quilcate.jorge@gmail.com> wrote:
> > > >>> > >> >
> > > >>> > >> > > Hi everyone, sorry for the late reply.
> > > >>> > >> > >
> > > >>> > >> > > Thanks Matthias for your feedback. I think it makes sense to
> > > >>> > >> reconsider
> > > >>> > >> > > the current design based on your input.
> > > >>> > >> > >
> > > >>> > >> > > After digging deeper into the current implementation, I'd
> > like
> > > >>> to
> > > >>> > >> bring my
> > > >>> > >> > > current understanding to be double-checked as it might be
> > > >>> redefining
> > > >>> > >> the
> > > >>> > >> > > KIP's scope:
> > > >>> > >> > >
> > > >>> > >> > > 1. There are 2 ranges been exposed by different stores:
> > > >>> > >> > >
> > > >>> > >> > > a. Key Range
> > > >>> > >> > > b. Timestamp Range
> > > >>> > >> > >
> > > >>> > >> > > So far, we have discussed covering both.
> > > >>> > >> > >
> > > >>> > >> > > 2. Key Range functions do not provide ordering guarantees by
> > > >>> design:
> > > >>> > >> > >
> > > >>> > >> > > ```ReadOnlyKeyValueStore.java
> > > >>> > >> > >     /**
> > > >>> > >> > >      * Get an iterator over a given range of keys. This
> > > >>> iterator must
> > > >>> > >> be
> > > >>> > >> > > closed after use.
> > > >>> > >> > >      * The returned iterator must be safe from {@link
> > > >>> > >> > > java.util.ConcurrentModificationException}s
> > > >>> > >> > >      * and must not return null values. No ordering
> > guarantees
> > > >>> are
> > > >>> > >> > > provided.
> > > >>> > >> > >      * ...
> > > >>> > >> > >      */
> > > >>> > >> > >      KeyValueIterator<K, V> range(K from, K to);
> > > >>> > >> > > ```
> > > >>> > >> > >
> > > >>> > >> > > Therefore, I'd propose removing Key range operations from
> > the
> > > >>> scope.
> > > >>> > >> > >
> > > >>> > >> > > 3. Timestamp Range operations happen at the SegmentsStore
> > level
> > > >>> > >> (internal)
> > > >>> > >> > > API
> > > >>> > >> > >
> > > >>> > >> > > AFAICT, Segments wrappers handle all Timestamp ranges
> > queries.
> > > >>> > >> > >
> > > >>> > >> > > I'd propose extending `Segments#segments(timeFrom, timeTo,
> > > >>> backwards)`
> > > >>> > >> > > with a flag for backwards operations.
> > > >>> > >> > >
> > > >>> > >> > > As segments returned will be processed backwards, I'm not
> > > >>> extending
> > > >>> > >> > > KeyValueStores to query each segment backwards as previous
> > > >>> point 2.
> > > >>> > >> > >
> > > >>> > >> > > 4. Extend WindowStores implementations with a new
> > > >>> > >> > > WindowBackwardStore/ReadOnlyBackwardStore:
> > > >>> > >> > >
> > > >>> > >> > > ```java
> > > >>> > >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
> > > >>> > >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant
> > from,
> > > >>> Instant
> > > >>> > >> to)
> > > >>> > >> > > throws IllegalArgumentException;
> > > >>> > >> > >
> > > >>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from,
> > K to,
> > > >>> > >> Instant
> > > >>> > >> > > fromTime, Instant toTime)
> > > >>> > >> > >         throws IllegalArgumentException;
> > > >>> > >> > >
> > > >>> > >> > >     KeyValueIterator<Windowed<K>, V>
> > backwardFetchAll(Instant
> > > >>> from,
> > > >>> > >> > > Instant to) throws IllegalArgumentException;
> > > >>> > >> > > ```
> > > >>> > >> > >
> > > >>> > >> > > 5. SessionStore is a bit different as it has fetch/find
> > sessions
> > > >>> > >> spread
> > > >>> > >> > > between SessionStore and ReadOnlySessionStore.
> > > >>> > >> > >
> > > >>> > >> > > I'd propose a new interface `SessionBackwardStore` to expose
> > > >>> backward
> > > >>> > >> find
> > > >>> > >> > > operations:
> > > >>> > >> > >
> > > >>> > >> > > ```java
> > > >>> > >> > > public interface SessionBackwardStore<K, AGG> {
> > > >>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
> > > >>> backwardFindSessions(final K
> > > >>> > >> key,
> > > >>> > >> > > final long earliestSessionEndTime, final long
> > > >>> latestSessionStartTime);
> > > >>> > >> > >
> > > >>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
> > > >>> backwardFindSessions(final K
> > > >>> > >> > > keyFrom, final K keyTo, final long earliestSessionEndTime,
> > > >>> final long
> > > >>> > >> > > latestSessionStartTime);
> > > >>> > >> > > }
> > > >>> > >> > > ```
> > > >>> > >> > >
> > > >>> > >> > > If this understanding is correct I'll proceed to update the
> > KIP
> > > >>> based
> > > >>> > >> on
> > > >>> > >> > > this.
> > > >>> > >> > >
> > > >>> > >> > > Looking forward to your feedback.
> > > >>> > >> > >
> > > >>> > >> > > Thanks,
> > > >>> > >> > > Jorge.
> > > >>> > >> > >
> > > >>> > >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <
> > > >>> mjsax@apache.org>
> > > >>> > >> wrote:
> > > >>> > >> > >
> > > >>> > >> > >> Hey,
> > > >>> > >> > >>
> > > >>> > >> > >> Sorry that I am late to the game. I am not 100% convinced
> > > >>> about the
> > > >>> > >> > >> current proposal. Using a new config as feature flag seems
> > to
> > > >>> be
> > > >>> > >> rather
> > > >>> > >> > >> "nasty" to me, and flipping from/to is a little bit too
> > fancy
> > > >>> for my
> > > >>> > >> > >> personal taste.
> > > >>> > >> > >>
> > > >>> > >> > >> I agree, that the original proposal using a "ReadDirection"
> > > >>> enum is
> > > >>> > >> not
> > > >>> > >> > >> ideal either.
> > > >>> > >> > >>
> > > >>> > >> > >> Thus, I would like to put out a new idea: We could add a
> > new
> > > >>> > >> interface
> > > >>> > >> > >> that offers new methods that return revers iterators.
> > > >>> > >> > >>
> > > >>> > >> > >> The KIP already proposes to add `reverseAll()` and it seems
> > > >>> backward
> > > >>> > >> > >> incompatible to just add this method to
> > > >>> `ReadOnlyKeyValueStore` and
> > > >>> > >> > >> `ReadOnlyWindowStore`. I don't think we could provide a
> > useful
> > > >>> > >> default
> > > >>> > >> > >> implementation for custom stores and thus either break
> > > >>> compatibility
> > > >>> > >> or
> > > >>> > >> > >> need add a default that just throws an exception. Neither
> > > >>> seems to
> > > >>> > >> be a
> > > >>> > >> > >> good option.
> > > >>> > >> > >>
> > > >>> > >> > >> Using a new interface avoid this issue and allows users
> > > >>> implementing
> > > >>> > >> > >> custom stores to opt-in by adding the interface to their
> > > >>> stores.
> > > >>> > >> > >> Furthermore, we don't need any config. In the end, we
> > > >>> encapsulte the
> > > >>> > >> > >> change into the store, and our runtime is agnostic to it
> > (as it
> > > >>> > >> should
> > > >>> > >> > >> be).
> > > >>> > >> > >>
> > > >>> > >> > >> The hierarchy becomes a little complex (but uses would not
> > > >>> really see
> > > >>> > >> > >> the complexity):
> > > >>> > >> > >>
> > > >>> > >> > >> // exsiting
> > > >>> > >> > >> ReadOnlyKeyValueStore
> > > >>> > >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> // helper interface; users don't care
> > > >>> > >> > >> // need similar ones for other stores
> > > >>> > >> > >> ReverseReadOnlyKeyValueStore {
> > > >>> > >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
> > > >>> > >> > >>     KeyValueIterator<K, V> reverseAll();
> > > >>> > >> > >> }
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> // two new user facing interfaces for kv-store
> > > >>> > >> > >> // need similar ones for other stores
> > > >>> > >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
> > > >>> > >> ReadOnlyKeyValueStore,
> > > >>> > >> > >> ReverseReadOnlyKeyValueStore
> > > >>> > >> > >>
> > > >>> > >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
> > > >>> > >> > >> ReverseReadOnlyKeyValueStore
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> // updated (also internal)
> > > >>> > >> > >> // also need to update other built-in stores
> > > >>> > >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
> > > >>> > >> BulkLoadingStore
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> In the end, user would only care about the two (for
> > kv-case)
> > > >>> new
> > > >>> > >> > >> interface that offer revers iterator (read only and
> > regular)
> > > >>> and can
> > > >>> > >> > >> cast stores accordingly in their Processors/Transformers or
> > > >>> via IQ.
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> Btw: if we add revers iterator for KeyValue and Window
> > store,
> > > >>> should
> > > >>> > >> we
> > > >>> > >> > >> do the same for Session store?
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> This might be more code to write, but I believe it
> > provides the
> > > >>> > >> better
> > > >>> > >> > >> user experience. Thoughts?
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> -Matthias
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
> > > >>> > >> > >> > Sorry for my silence, Jorge,
> > > >>> > >> > >> >
> > > >>> > >> > >> > I've just taken another look, and I'm personally happy
> > with
> > > >>> the
> > > >>> > >> KIP.
> > > >>> > >> > >> >
> > > >>> > >> > >> > Thanks,
> > > >>> > >> > >> > -John
> > > >>> > >> > >> >
> > > >>> > >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate
> > Otoya
> > > >>> wrote:
> > > >>> > >> > >> >> If no additional comments, I will proceed to start the a
> > > >>> vote
> > > >>> > >> thread.
> > > >>> > >> > >> >>
> > > >>> > >> > >> >> Thanks a lot for your feedback!
> > > >>> > >> > >> >>
> > > >>> > >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate
> > > >>> Otoya <
> > > >>> > >> > >> >> quilcate.jorge@gmail.com> wrote:
> > > >>> > >> > >> >>
> > > >>> > >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
> > > >>> > >> > >> >>>
> > > >>> > >> > >> >>> I updated the KIP with your feedback.
> > > >>> > >> > >> >>>
> > > >>> > >> > >> >>>
> > > >>> > >> > >> >>>
> > > >>> > >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
> > > >>> > >> > >> sophie@confluent.io>
> > > >>> > >> > >> >>> wrote:
> > > >>> > >> > >> >>>
> > > >>> > >> > >> >>>> Hm, the case of `all()` does seem to present a
> > dilemma in
> > > >>> the
> > > >>> > >> case of
> > > >>> > >> > >> >>>> variable-length keys.
> > > >>> > >> > >> >>>>
> > > >>> > >> > >> >>>> In the case of fixed-length keys, you can just compute
> > > >>> the keys
> > > >>> > >> that
> > > >>> > >> > >> >>>> correspond
> > > >>> > >> > >> >>>> to the maximum and minimum serialized bytes, then
> > perform
> > > >>> a
> > > >>> > >> `range()`
> > > >>> > >> > >> >>>> query
> > > >>> > >> > >> >>>> instead of an `all()`. If your keys don't have a
> > > >>> well-defined
> > > >>> > >> > >> ordering
> > > >>> > >> > >> >>>> such
> > > >>> > >> > >> >>>> that
> > > >>> > >> > >> >>>> you can't determine the MAX_KEY, then you probably
> > don't
> > > >>> care
> > > >>> > >> about
> > > >>> > >> > >> the
> > > >>> > >> > >> >>>> iterator order anyway.
> > > >>> > >> > >> >>>>
> > > >>> > >> > >> >>>>  But with variable-length keys, there is no MAX_KEY.
> > If
> > > >>> all your
> > > >>> > >> > >> keys were
> > > >>> > >> > >> >>>> just
> > > >>> > >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact
> > > >>> the only
> > > >>> > >> way
> > > >>> > >> > >> to
> > > >>> > >> > >> >>>> figure out the
> > > >>> > >> > >> >>>> maximum key in the store is by using `all()` -- and
> > > >>> without a
> > > >>> > >> reverse
> > > >>> > >> > >> >>>> iterator, you're
> > > >>> > >> > >> >>>> doomed to iterate through every single key just to
> > answer
> > > >>> that
> > > >>> > >> simple
> > > >>> > >> > >> >>>> question.
> > > >>> > >> > >> >>>>
> > > >>> > >> > >> >>>> That said, I still think determining the iterator
> > order
> > > >>> based
> > > >>> > >> on the
> > > >>> > >> > >> >>>> to/from bytes
> > > >>> > >> > >> >>>> makes a lot of intuitive sense and gives the API a
> > nice
> > > >>> > >> symmetry.
> > > >>> > >> > >> What if
> > > >>> > >> > >> >>>> we
> > > >>> > >> > >> >>>> solved the `all()` problem by just giving `all()` a
> > > >>> reverse
> > > >>> > >> form to
> > > >>> > >> > >> >>>> complement it?
> > > >>> > >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or
> > something
> > > >>> to
> > > >>> > >> that
> > > >>> > >> > >> effect.
> > > >>> > >> > >> >>>>
> > > >>> > >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
> > > >>> Otoya <
> > > >>> > >> > >> >>>> quilcate.jorge@gmail.com> wrote:
> > > >>> > >> > >> >>>>
> > > >>> > >> > >> >>>>> Thanks John.
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>> Agree. I like the first approach as well, with
> > > >>> StreamsConfig
> > > >>> > >> flag
> > > >>> > >> > >> >>>> passing
> > > >>> > >> > >> >>>>> by via ProcessorContext.
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>> Another positive effect with "reverse parameters" is
> > > >>> that in
> > > >>> > >> the
> > > >>> > >> > >> case of
> > > >>> > >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can
> > > >>> decide
> > > >>> > >> _which_
> > > >>> > >> > >> pair
> > > >>> > >> > >> >>>> to
> > > >>> > >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to
> > both.
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>> The only issue I've found while reviewing the KIP is
> > that
> > > >>> > >> `all()`
> > > >>> > >> > >> won't
> > > >>> > >> > >> >>>> fit
> > > >>> > >> > >> >>>>> within this approach.
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>> We could remove it from the KIP and argue that for
> > > >>> WindowStore,
> > > >>> > >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all
> > in
> > > >>> reverse
> > > >>> > >> > >> order,
> > > >>> > >> > >> >>>> and
> > > >>> > >> > >> >>>>> for KeyValueStore, no ordering guarantees are
> > provided.
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>> If there is consensus with this changes, I will go
> > and
> > > >>> update
> > > >>> > >> the
> > > >>> > >> > >> KIP.
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
> > > >>> > >> vvcephei@apache.org>
> > > >>> > >> > >> >>>> wrote:
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>>> Hi Jorge,
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would
> > > >>> protect
> > > >>> > >> anyone
> > > >>> > >> > >> >>>>>> who may be depending on the current behavior.
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> It seems better to locate the feature flag in the
> > > >>> > >> initialization
> > > >>> > >> > >> >>>> logic of
> > > >>> > >> > >> >>>>>> the store, rather than have a method on the "live"
> > > >>> store that
> > > >>> > >> > >> changes
> > > >>> > >> > >> >>>>>> its behavior on the fly.
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> It seems like there are two options here, one is to
> > add
> > > >>> a new
> > > >>> > >> > >> config:
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
> > > >>> > >> > >> >>>>>>   "enable.backwards.iteration
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> Or we can add a feature flag in Materialized, like
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> Materialized.enableBackwardsIteration()
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> I think I'd personally lean toward the config, for
> > the
> > > >>> > >> following
> > > >>> > >> > >> >>>> reason.
> > > >>> > >> > >> >>>>>> The concern that Sophie raised is that someone's
> > > >>> program may
> > > >>> > >> depend
> > > >>> > >> > >> >>>>>> on the existing contract of getting an empty
> > iterator.
> > > >>> We
> > > >>> > >> don't
> > > >>> > >> > >> want
> > > >>> > >> > >> >>>> to
> > > >>> > >> > >> >>>>>> switch behavior when they aren't expecting it, so we
> > > >>> provide
> > > >>> > >> them a
> > > >>> > >> > >> >>>>>> config to assert that they _are_ expecting the new
> > > >>> behavior,
> > > >>> > >> which
> > > >>> > >> > >> >>>>>> means they take responsibility for updating their
> > code
> > > >>> to
> > > >>> > >> expect
> > > >>> > >> > >> the
> > > >>> > >> > >> >>>> new
> > > >>> > >> > >> >>>>>> behavior.
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> There doesn't seem to be a reason to offer a choice
> > of
> > > >>> > >> behaviors
> > > >>> > >> > >> on a
> > > >>> > >> > >> >>>>>> per-query, or per-store basis. We just want people
> > to
> > > >>> be not
> > > >>> > >> > >> surprised
> > > >>> > >> > >> >>>>>> by this change in general.
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> What do you think?
> > > >>> > >> > >> >>>>>> Thanks,
> > > >>> > >> > >> >>>>>> -John
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate
> > wrote:
> > > >>> > >> > >> >>>>>>> Thank you both for the great feedback.
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes
> > the
> > > >>> need
> > > >>> > >> for
> > > >>> > >> > >> >>>>>>> additional API methods. And with a feature flag on
> > > >>> > >> `StateStore`,
> > > >>> > >> > >> >>>>>>> disabled by default, should no break current users.
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>> The only side-effect I can think of is that: by
> > moving
> > > >>> the
> > > >>> > >> flag
> > > >>> > >> > >> >>>>> upwards,
> > > >>> > >> > >> >>>>>>> all later operations become affected; which might
> > be
> > > >>> ok for
> > > >>> > >> most
> > > >>> > >> > >> >>>> (all?)
> > > >>> > >> > >> >>>>>>> cases. I can't think of an scenario where this
> > would
> > > >>> be an
> > > >>> > >> issue,
> > > >>> > >> > >> >>>> just
> > > >>> > >> > >> >>>>>>> want to point this out.
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>> If moving to this approach, I'd like to check if I
> > got
> > > >>> this
> > > >>> > >> right
> > > >>> > >> > >> >>>>> before
> > > >>> > >> > >> >>>>>>> updating the KIP:
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>> - only `StateStore` will change by having a new
> > method:
> > > >>> > >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep
> > > >>> things
> > > >>> > >> > >> compatible.
> > > >>> > >> > >> >>>>>>> - then all `*Stores` will have to update their
> > > >>> implementation
> > > >>> > >> > >> based
> > > >>> > >> > >> >>>> on
> > > >>> > >> > >> >>>>>>> this flag.
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> > > >>> > >> > >> >>>>>>>>> There's no possibility that someone could be
> > relying
> > > >>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> > > >>> because
> > > >>> > >> that's
> > > >>> > >> > >> >>>> not
> > > >>> > >> > >> >>>>>> what
> > > >>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> > > >>> getting
> > > >>> > >> an
> > > >>> > >> > >> >>>> empty
> > > >>> > >> > >> >>>>>>>> iterator
> > > >>> > >> > >> >>>>>>>>
> > > >>> > >> > >> >>>>>>>> I just meant that they might be relying on the
> > > >>> assumption
> > > >>> > >> that
> > > >>> > >> > >> the
> > > >>> > >> > >> >>>>>> range
> > > >>> > >> > >> >>>>>>>> query
> > > >>> > >> > >> >>>>>>>> will never return results with decreasing keys.
> > The
> > > >>> empty
> > > >>> > >> > >> iterator
> > > >>> > >> > >> >>>>>> wouldn't
> > > >>> > >> > >> >>>>>>>> break that contract, but of course a surprise
> > reverse
> > > >>> > >> iterator
> > > >>> > >> > >> >>>> would.
> > > >>> > >> > >> >>>>>>>>
> > > >>> > >> > >> >>>>>>>> FWIW I actually am in favor of automatically
> > > >>> converting to a
> > > >>> > >> > >> >>>> reverse
> > > >>> > >> > >> >>>>>>>> iterator,
> > > >>> > >> > >> >>>>>>>> I just thought we should consider whether this
> > should
> > > >>> be
> > > >>> > >> off by
> > > >>> > >> > >> >>>>>> default or
> > > >>> > >> > >> >>>>>>>> even possible to disable at all.
> > > >>> > >> > >> >>>>>>>>
> > > >>> > >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
> > > >>> > >> > >> vvcephei@apache.org
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>>> wrote:
> > > >>> > >> > >> >>>>>>>>
> > > >>> > >> > >> >>>>>>>>> Thanks for the response, Sophie,
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>>>> I wholeheartedly agree we should take as much
> > into
> > > >>> account
> > > >>> > >> as
> > > >>> > >> > >> >>>>> possible
> > > >>> > >> > >> >>>>>>>>> up front, rather than regretting our decisions
> > > >>> later. I
> > > >>> > >> actually
> > > >>> > >> > >> >>>> do
> > > >>> > >> > >> >>>>>> share
> > > >>> > >> > >> >>>>>>>>> your vague sense of worry, which was what led me
> > to
> > > >>> say
> > > >>> > >> > >> initially
> > > >>> > >> > >> >>>>>> that I
> > > >>> > >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
> > > >>> > >> Sometimes, it's
> > > >>> > >> > >> >>>>>> better
> > > >>> > >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think
> > more
> > > >>> > >> people
> > > >>> > >> > >> >>>> will be
> > > >>> > >> > >> >>>>>>>>> confused
> > > >>> > >> > >> >>>>>>>>> than not.
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>>>> I really don't think that there's any danger of
> > > >>> "relying
> > > >>> > >> on a
> > > >>> > >> > >> >>>> bug"
> > > >>> > >> > >> >>>>>> here,
> > > >>> > >> > >> >>>>>>>>> although
> > > >>> > >> > >> >>>>>>>>> people certainly could be relying on current
> > > >>> behavior. One
> > > >>> > >> thing
> > > >>> > >> > >> >>>> to
> > > >>> > >> > >> >>>>> be
> > > >>> > >> > >> >>>>>>>>> clear
> > > >>> > >> > >> >>>>>>>>> about (which I just left a more detailed comment
> > in
> > > >>> > >> KAFKA-8159
> > > >>> > >> > >> >>>>> about)
> > > >>> > >> > >> >>>>>> is
> > > >>> > >> > >> >>>>>>>>> that
> > > >>> > >> > >> >>>>>>>>> when we say something like key1 > key2, this
> > > >>> ordering is
> > > >>> > >> defined
> > > >>> > >> > >> >>>> by
> > > >>> > >> > >> >>>>>> the
> > > >>> > >> > >> >>>>>>>>> serde's output and nothing else.
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>>>> Currently, thanks to your fix in
> > > >>> > >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
> > > >>> > >> > >> >>>>>>>>> ,
> > > >>> > >> > >> >>>>>>>>> the store contract is that for range scans, if
> > from
> > > >>> > to,
> > > >>> > >> then
> > > >>> > >> > >> >>>> the
> > > >>> > >> > >> >>>>>> store
> > > >>> > >> > >> >>>>>>>>> must
> > > >>> > >> > >> >>>>>>>>> return an empty iterator. There's no possibility
> > that
> > > >>> > >> someone
> > > >>> > >> > >> >>>> could
> > > >>> > >> > >> >>>>> be
> > > >>> > >> > >> >>>>>>>>> relying
> > > >>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> > > >>> because
> > > >>> > >> that's
> > > >>> > >> > >> >>>> not
> > > >>> > >> > >> >>>>>> what
> > > >>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> > > >>> getting
> > > >>> > >> an
> > > >>> > >> > >> >>>> empty
> > > >>> > >> > >> >>>>>>>>> iterator.
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>>>> My counterproposal was to actually change this
> > > >>> contract to
> > > >>> > >> say
> > > >>> > >> > >> >>>> that
> > > >>> > >> > >> >>>>>> the
> > > >>> > >> > >> >>>>>>>>> store
> > > >>> > >> > >> >>>>>>>>> must return an iterator over the keys in that
> > range,
> > > >>> but
> > > >>> > >> in the
> > > >>> > >> > >> >>>>>> reverse
> > > >>> > >> > >> >>>>>>>>> order.
> > > >>> > >> > >> >>>>>>>>> So, in addition to considering whether this idea
> > is
> > > >>> "too
> > > >>> > >> fancy"
> > > >>> > >> > >> >>>> (aka
> > > >>> > >> > >> >>>>>>>>> confusing),
> > > >>> > >> > >> >>>>>>>>> we should also consider the likelihood of
> > breaking an
> > > >>> > >> existing
> > > >>> > >> > >> >>>>>> program with
> > > >>> > >> > >> >>>>>>>>> this behavior/contract change.
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>>>> To echo your clarification, I'm also not
> > advocating
> > > >>> > >> strongly in
> > > >>> > >> > >> >>>>> favor
> > > >>> > >> > >> >>>>>> of my
> > > >>> > >> > >> >>>>>>>>> proposal. I just wanted to present it for
> > > >>> consideration
> > > >>> > >> > >> alongside
> > > >>> > >> > >> >>>>>> Jorge's
> > > >>> > >> > >> >>>>>>>>> original one.
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>>>> Thanks for raising these very good points,
> > > >>> > >> > >> >>>>>>>>> -John
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie
> > Blee-Goldman
> > > >>> wrote:
> > > >>> > >> > >> >>>>>>>>>>> Rather than working around it, I think we
> > should
> > > >>> just
> > > >>> > >> fix it
> > > >>> > >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
> > > >>> > >> > >> >>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>> That was my primary concern, although I do have
> > a
> > > >>> vague
> > > >>> > >> sense
> > > >>> > >> > >> of
> > > >>> > >> > >> >>>>>> worry
> > > >>> > >> > >> >>>>>>>>>> that we might be allowing users to get into
> > trouble
> > > >>> > >> without
> > > >>> > >> > >> >>>>>> realizing it.
> > > >>> > >> > >> >>>>>>>>>> For example if their custom serdes suffer a
> > similar
> > > >>> bug
> > > >>> > >> as the
> > > >>> > >> > >> >>>>> above,
> > > >>> > >> > >> >>>>>>>>>> and/or
> > > >>> > >> > >> >>>>>>>>>> they rely on getting results in increasing order
> > > >>> (of the
> > > >>> > >> keys)
> > > >>> > >> > >> >>>> even
> > > >>> > >> > >> >>>>>> when
> > > >>> > >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact
> > that
> > > >>> the
> > > >>> > >> range
> > > >>> > >> > >> >>>> query
> > > >>> > >> > >> >>>>>> returns
> > > >>> > >> > >> >>>>>>>>>> nothing in that case.
> > > >>> > >> > >> >>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug
> > or
> > > >>> not,
> > > >>> > >> but in
> > > >>> > >> > >> >>>> that
> > > >>> > >> > >> >>>>>> past
> > > >>> > >> > >> >>>>>>>>>> we've
> > > >>> > >> > >> >>>>>>>>>> taken the stance that we should not break
> > > >>> compatibility
> > > >>> > >> even if
> > > >>> > >> > >> >>>> the
> > > >>> > >> > >> >>>>>> user
> > > >>> > >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
> > > >>> > >> > >> >>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly
> > against
> > > >>> this
> > > >>> > >> > >> >>>> proposal,
> > > >>> > >> > >> >>>>>> just
> > > >>> > >> > >> >>>>>>>>>> laying
> > > >>> > >> > >> >>>>>>>>>> out some considerations we should take into
> > > >>> account. At
> > > >>> > >> the end
> > > >>> > >> > >> >>>> of
> > > >>> > >> > >> >>>>>> the
> > > >>> > >> > >> >>>>>>>>> day
> > > >>> > >> > >> >>>>>>>>>> we should do what's right rather than maintain
> > > >>> > >> compatibility
> > > >>> > >> > >> >>>> with
> > > >>> > >> > >> >>>>>>>>> existing
> > > >>> > >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle
> > > >>> ground.
> > > >>> > >> > >> >>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
> > > >>> > >> > >> >>>> vvcephei@apache.org>
> > > >>> > >> > >> >>>>>>>>> wrote:
> > > >>> > >> > >> >>>>>>>>>>> Thanks Sophie,
> > > >>> > >> > >> >>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working
> > > >>> around it,
> > > >>> > >> I
> > > >>> > >> > >> >>>> think
> > > >>> > >> > >> >>>>> we
> > > >>> > >> > >> >>>>>>>>> should
> > > >>> > >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the
> > Jira.
> > > >>> > >> > >> >>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s
> > > >>> concern
> > > >>> > >> that some
> > > >>> > >> > >> >>>>>> serdes
> > > >>> > >> > >> >>>>>>>>>>> might be incorrectly written.
> > > >>> > >> > >> >>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>> Were there other practical concerns that you
> > had
> > > >>> in mind?
> > > >>> > >> > >> >>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>> Thanks,
> > > >>> > >> > >> >>>>>>>>>>> John
> > > >>> > >> > >> >>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie
> > Blee-Goldman
> > > >>> > >> wrote:
> > > >>> > >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the
> > to/from
> > > >>> bytes
> > > >>> > >> but I
> > > >>> > >> > >> >>>>> think
> > > >>> > >> > >> >>>>>>>>> there
> > > >>> > >> > >> >>>>>>>>>>>> are some practical limitations to implementing
> > > >>> this. In
> > > >>> > >> > >> >>>>> particular
> > > >>> > >> > >> >>>>>>>>>>>> I'm thinking about this issue
> > > >>> > >> > >> >>>>>>>>>>>> <
> > https://issues.apache.org/jira/browse/KAFKA-8159
> > > >>> >
> > > >>> > >> with the
> > > >>> > >> > >> >>>>>> built-in
> > > >>> > >> > >> >>>>>>>>>>> signed
> > > >>> > >> > >> >>>>>>>>>>>> number serdes.
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
> > > >>> > >> > >> >>>> negative-negative
> > > >>> > >> > >> >>>>>>>>> queries
> > > >>> > >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would
> > cause
> > > >>> > >> > >> undetectable
> > > >>> > >> > >> >>>>>>>>>>>> incorrect results for negative-positive
> > queries.
> > > >>> For
> > > >>> > >> example,
> > > >>> > >> > >> >>>> say
> > > >>> > >> > >> >>>>>> you
> > > >>> > >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using
> > the
> > > >>> Short
> > > >>> > >> > >> serdes.
> > > >>> > >> > >> >>>>> The
> > > >>> > >> > >> >>>>>>>>>>>> serialized bytes for that are
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> from = 1111111111111111
> > > >>> > >> > >> >>>>>>>>>>>> to = 0000000000000001
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> so we would end up flipping those and
> > iterating
> > > >>> over
> > > >>> > >> all keys
> > > >>> > >> > >> >>>>> from
> > > >>> > >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111.
> > Iterating in
> > > >>> > >> > >> >>>>> lexicographical
> > > >>> > >> > >> >>>>>>>>>>>> order means we would iterate over every key in
> > > >>> the space
> > > >>> > >> > >> >>>> *except*
> > > >>> > >> > >> >>>>>> for
> > > >>> > >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we
> > > >>> meant to be
> > > >>> > >> > >> >>>> included
> > > >>> > >> > >> >>>>>> in
> > > >>> > >> > >> >>>>>>>>> the
> > > >>> > >> > >> >>>>>>>>>>>> range query.
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> Currently we just log a warning and return an
> > > >>> empty
> > > >>> > >> iterator
> > > >>> > >> > >> >>>> when
> > > >>> > >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect
> > but
> > > >>> feels
> > > >>> > >> > >> >>>> slightly
> > > >>> > >> > >> >>>>>> more
> > > >>> > >> > >> >>>>>>>>>>>> palatable. If we start automatically
> > converting to
> > > >>> > >> reverse
> > > >>> > >> > >> >>>>> queries
> > > >>> > >> > >> >>>>>> we
> > > >>> > >> > >> >>>>>>>>>>>> can't even log a warning in this case unless
> > we
> > > >>> wanted
> > > >>> > >> to log
> > > >>> > >> > >> >>>> a
> > > >>> > >> > >> >>>>>>>>> warning
> > > >>> > >> > >> >>>>>>>>>>>> every time, which would be weird to do for a
> > valid
> > > >>> > >> usage of a
> > > >>> > >> > >> >>>> new
> > > >>> > >> > >> >>>>>>>>>>>> feature.
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> All that said, I still like the idea overall.
> > Off
> > > >>> the
> > > >>> > >> top of
> > > >>> > >> > >> >>>> my
> > > >>> > >> > >> >>>>>> head
> > > >>> > >> > >> >>>>>>>>> I
> > > >>> > >> > >> >>>>>>>>>>> guess
> > > >>> > >> > >> >>>>>>>>>>>> we could add a store config to enable/disable
> > > >>> automatic
> > > >>> > >> > >> >>>> reverse
> > > >>> > >> > >> >>>>>>>>>>> iteration,
> > > >>> > >> > >> >>>>>>>>>>>> which is off by default?
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice
> > addition
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> Sophie
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
> > > >>> > >> > >> >>>>> vvcephei@apache.org>
> > > >>> > >> > >> >>>>>>>>>>> wrote:
> > > >>> > >> > >> >>>>>>>>>>>>> Hi there Jorge,
> > > >>> > >> > >> >>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>> Thanks for the KIP!
> > > >>> > >> > >> >>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
> > > >>> > >> > >> >>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but
> > > >>> what do
> > > >>> > >> you
> > > >>> > >> > >> >>>> think
> > > >>> > >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing
> > > >>> people to
> > > >>> > >> flip
> > > >>> > >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading
> > > >>> from "A"
> > > >>> > >> to "Z"
> > > >>> > >> > >> >>>>> would
> > > >>> > >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would
> > be a
> > > >>> > >> backward
> > > >>> > >> > >> >>>> one?
> > > >>> > >> > >> >>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>> Thanks,
> > > >>> > >> > >> >>>>>>>>>>>>> -John
> > > >>> > >> > >> >>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge
> > Quilcate
> > > >>> wrote:
> > > >>> > >> > >> >>>>>>>>>>>>>> Hi everyone,
> > > >>> > >> > >> >>>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>>> I would like to start the discussion for
> > > >>> KIP-617:
> > > >>> > >> > >> >>>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>
> > > >>> > >> > >>
> > > >>> > >>
> > > >>>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> > > >>> > >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
> > > >>> > >> > >> >>>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>>> Thanks!
> > > >>> > >> > >> >>>>>>>>>>>>>> Jorge.
> > > >>> > >> > >> >>>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>>>
> > > >>> > >> > >> >>>>>>>>>>>>>> Attachments:
> > > >>> > >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>>
> > > >>> > >> > >> >>>>>>> Attachments:
> > > >>> > >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
> > > >>> > >> > >> >>>>>>
> > > >>> > >> > >> >>>>>
> > > >>> > >> > >> >>>>
> > > >>> > >> > >> >>>
> > > >>> > >> > >> >>
> > > >>> > >> > >>
> > > >>> > >> > >>
> > > >>> > >> >
> > > >>> > >>
> > > >>> > >
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by Jorge Esteban Quilcate Otoya <qu...@gmail.com>.
Hi John,

Thanks for the feedback.

I'd be happy to take the third option and consider moving methods to
ReadOnlySessionStore as part of the KIP.
Docs is updated to reflect these changes.

Cheers,
Jorge.

On Thu, Jul 2, 2020 at 10:06 PM John Roesler <vv...@apache.org> wrote:

> Hey Jorge,
>
> Thanks for the details. That sounds like a mistake to me on both counts.
>
> I don’t think you need to worry about those depreciations. If the
> interface methods aren’t deprecated, then the methods are not deprecated.
> We should remove the annotations, but it doesn’t need to be in the kip.
>
> I think any query methods should have been in the ReadOnly interface. I
> guess it’s up to you whether you want to:
> 1. Add the reverse methods next to the existing methods (what you have in
> the kip right now)
> 2. Partially fix it by adding your new methods to the ReadOnly interface
> 3. Fully fix the problem by moving the existing methods as well as your
> new ones. Since  SessionStore extends ReadOnlySessionStore, it’s ok just to
> move the definitions.
>
> I’m ok with whatever you prefer.
>
> Thanks,
> John
>
> On Thu, Jul 2, 2020, at 11:29, Jorge Esteban Quilcate Otoya wrote:
> > (sorry for the spam)
> >
> > Actually `findSessions` are only deprecated on `InMemorySessionStore`,
> > which seems strange as RocksDB and interfaces haven't marked these
> methods
> > as deprecated.
> >
> > Any hint on how to handle this?
> >
> > Cheers,
> >
> > On Thu, Jul 2, 2020 at 4:57 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jorge@gmail.com> wrote:
> >
> > > @John: I can see there are some deprecations in there as well. Will
> update
> > > the KIP.
> > >
> > > Thanks,
> > > Jorge
> > >
> > >
> > > On Thu, Jul 2, 2020 at 3:29 PM Jorge Esteban Quilcate Otoya <
> > > quilcate.jorge@gmail.com> wrote:
> > >
> > >> Thanks John.
> > >>
> > >> > it looks like there’s a revision error in which two methods are
> > >> proposed for SessionStore, but seem like they should be in
> > >> ReadOnlySessionStore. Do I read that right?
> > >>
> > >> Yes, I've opted to keep the new methods alongside the existing ones.
> In
> > >> the case of SessionStore, `findSessions` are in `SessionStore`, and
> `fetch`
> > >> are in `ReadOnlySessionStore`. If it makes more sense, I can move all
> of
> > >> them to ReadOnlySessionStore.
> > >> Let me know what you think.
> > >>
> > >> Thanks,
> > >> Jorge.
> > >>
> > >> On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org>
> wrote:
> > >>
> > >>> Hi Jorge,
> > >>>
> > >>> Thanks for the update. I think this is a good plan.
> > >>>
> > >>> I just took a look at the KIP again, and it looks like there’s a
> > >>> revision error in which two methods are proposed for SessionStore,
> but seem
> > >>> like they should be in ReadOnlySessionStore. Do I read that right?
> > >>>
> > >>> Otherwise, I’m happy with your proposal.
> > >>>
> > >>> Thanks,
> > >>> John
> > >>>
> > >>> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
> > >>> > Quick update: KIP is updated with latest changes now.
> > >>> > Will leave it open this week while working on the PR.
> > >>> >
> > >>> > Hope to open a new vote thread over the next few days if no
> additional
> > >>> > feedback is provided.
> > >>> >
> > >>> > Cheers,
> > >>> > Jorge.
> > >>> >
> > >>> > On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
> > >>> > quilcate.jorge@gmail.com> wrote:
> > >>> >
> > >>> > > Thanks, John!
> > >>> > >
> > >>> > > Make sense to reconsider the current approach. I was heading in a
> > >>> similar
> > >>> > > direction while drafting the implementation. Metered, Caching,
> and
> > >>> other
> > >>> > > layers will also have to get duplicated to build up new methods
> in
> > >>> `Stores`
> > >>> > > factory, and class casting issues would appear on stores created
> by
> > >>> DSL.
> > >>> > >
> > >>> > > I will draft a proposal with new methods (move methods from
> proposed
> > >>> > > interfaces to existing ones) with default implementation in a KIP
> > >>> update
> > >>> > > and wait for Matthias to chime in to validate this approach.
> > >>> > >
> > >>> > > Jorge.
> > >>> > >
> > >>> > >
> > >>> > > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <
> vvcephei@apache.org>
> > >>> wrote:
> > >>> > >
> > >>> > >> Hi Jorge,
> > >>> > >>
> > >>> > >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
> > >>> releases.
> > >>> > >>
> > >>> > >> The idea to separate the new methods into "mixin" interfaces
> seems
> > >>> > >> like a good one, but as we've discovered in KIP-614, it doesn't
> work
> > >>> > >> out that way in practice. The problem is that the store
> > >>> implementations
> > >>> > >> are just the base layer that get composed with other layers in
> > >>> Streams
> > >>> > >> before they can be accessed in the DSL. This is extremely
> subtle, so
> > >>> > >> I'm going to put everyone to sleep with a detailed explanation:
> > >>> > >>
> > >>> > >> For example, this is the mechanism by which all KeyValueStore
> > >>> > >> implementations get added to Streams:
> > >>> > >>
> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
> > >>> > >> return new MeteredKeyValueStore<>(
> > >>> > >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
> > >>> > >>   storeSupplier.metricsScope(),
> > >>> > >>   time,
> > >>> > >>   keySerde,
> > >>> > >>   valueSerde
> > >>> > >> );
> > >>> > >>
> > >>> > >> In the DSL, the store that a processor gets from the context
> would
> > >>> be
> > >>> > >> the result of this composition. So even if the
> storeSupplier.get()
> > >>> returns
> > >>> > >> a store that implements the "reverse" interface, when you try to
> > >>> use it
> > >>> > >> from a processor like:
> > >>> > >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> > >>> > >> ReadOnlyBackwardWindowStore<K, V> store =
> > >>> > >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
> > >>> > >>
> > >>> > >> You'd just get a ClassCastException because it's actually a
> > >>> > >> MeteredKeyValueStore, which doesn't implement
> > >>> > >> ReadOnlyBackwardWindowStore.
> > >>> > >>
> > >>> > >> The only way to make this work would be to make the Metered,
> > >>> > >> Caching, and Logging layers also implement the new interfaces,
> > >>> > >> but this effectively forces implementations to also implement
> > >>> > >> the interface. Otherwise, the intermediate layers would have to
> > >>> > >> cast the store in each method, like this:
> > >>> > >> MeteredWindowStore#backwardFetch {
> > >>> > >>   ((ReadOnlyBackwardWindowStore<K, V>)
> innerStore).backwardFetch(..)
> > >>> > >> }
> > >>> > >>
> > >>> > >> And then if the implementation doesn't "opt in" by implementing
> > >>> > >> the interface, you'd get a ClassCastException, not when you get
> the
> > >>> > >> store, but when you try to use it.
> > >>> > >>
> > >>> > >> The fact that we get ClassCastExceptions no matter which way we
> > >>> > >> turn here indicates that we're really not getting any benefit
> from
> > >>> the
> > >>> > >> type system, which makes the extra interfaces seem not worth
> all the
> > >>> > >> code involved.
> > >>> > >>
> > >>> > >> Where we landed in KIP-614 is that, unless we want to completely
> > >>> > >> revamp the way that StateStores work in the DSL, you might as
> > >>> > >> well just add the new methods to the existing interfaces. To
> prevent
> > >>> > >> compilation errors, we can add default implementations that
> throw
> > >>> > >> UnsupportedOperationException. If a store doesn't opt in by
> > >>> > >> implementing the methods, you'd get an
> > >>> UnsupportedOperationException,
> > >>> > >> which seems no worse, and maybe better, than the
> ClassCastException
> > >>> > >> you'd get if we go with the "mixin interface" approach.
> > >>> > >>
> > >>> > >> A quick note: This entire discussion focuses on the DSL. If
> you're
> > >>> just
> > >>> > >> using the Processor API by directly adding the a custom store
> to the
> > >>> > >> Topology:
> > >>> > >> org.apache.kafka.streams.Topology#addStateStore
> > >>> > >> and then retrieving it in the processor via:
> > >>> > >>
> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
> > >>> > >> in
> > >>> > >> org.apache.kafka.streams.processor.Processor#init
> > >>> > >>
> > >>> > >> Then, you can both register and retrieve _any_ StateStore
> > >>> implementation.
> > >>> > >> There's no need to use KeyValueStore or any other built-in
> > >>> interface.
> > >>> > >> In other words, KeyValueStore and company are only part of the
> DSL,
> > >>> > >> not the PAPI. So, discussions about the build-in store
> interfaces
> > >>> are only
> > >>> > >> really relevant in the context of the DSL, Transformers, and
> > >>> Materialized.
> > >>> > >>
> > >>> > >> So, in conclusion, I'd really recommend just adding any new
> methods
> > >>> to
> > >>> > >> the existing store interfaces. We might be able to revamp the
> API
> > >>> in the
> > >>> > >> future to support mixins, but it's a much larger scope of work
> than
> > >>> this
> > >>> > >> KIP.
> > >>> > >> A more minor comment is that we don't need to add Deprecated
> > >>> variants
> > >>> > >> of new methods.
> > >>> > >>
> > >>> > >> Thanks again, and once again, I'm sorry I tuned out and didn't
> > >>> offer this
> > >>> > >> feedback before you revised the KIP.
> > >>> > >> -John
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >>
> > >>> > >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya
> wrote:
> > >>> > >> > Hi everyone,
> > >>> > >> >
> > >>> > >> > I've updated the KIP, applying Matthias' feedback regarding
> > >>> interface
> > >>> > >> > hierarchy.
> > >>> > >> >
> > >>> > >> > Also, following the last email, I think we can consider
> reverse
> > >>> > >> operations
> > >>> > >> > on KeyValue range as well, as implementation supports
> > >>> lexicographic
> > >>> > >> order.
> > >>> > >> >
> > >>> > >> > I considered different naming between Key-based ranges and
> > >>> Time-based
> > >>> > >> > ranges, and mitigate confusion when fetching keys and time
> ranges
> > >>> as
> > >>> > >> > WindowStore does:
> > >>> > >> >
> > >>> > >> > Key-based ranges: reverseRange(), reverseAll()
> > >>> > >> > Time-based ranges: backwardFetch()
> > >>> > >> >
> > >>> > >> > Then, key-based changes apply to KeyValueStore, and time-based
> > >>> changes
> > >>> > >> to
> > >>> > >> > Window and Session stores.
> > >>> > >> >
> > >>> > >> > Let me know if you have any questions.
> > >>> > >> >
> > >>> > >> > Thanks,
> > >>> > >> > Jorge.
> > >>> > >> >
> > >>> > >> >
> > >>> > >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya
> <
> > >>> > >> > quilcate.jorge@gmail.com> wrote:
> > >>> > >> >
> > >>> > >> > > Hi everyone, sorry for the late reply.
> > >>> > >> > >
> > >>> > >> > > Thanks Matthias for your feedback. I think it makes sense to
> > >>> > >> reconsider
> > >>> > >> > > the current design based on your input.
> > >>> > >> > >
> > >>> > >> > > After digging deeper into the current implementation, I'd
> like
> > >>> to
> > >>> > >> bring my
> > >>> > >> > > current understanding to be double-checked as it might be
> > >>> redefining
> > >>> > >> the
> > >>> > >> > > KIP's scope:
> > >>> > >> > >
> > >>> > >> > > 1. There are 2 ranges been exposed by different stores:
> > >>> > >> > >
> > >>> > >> > > a. Key Range
> > >>> > >> > > b. Timestamp Range
> > >>> > >> > >
> > >>> > >> > > So far, we have discussed covering both.
> > >>> > >> > >
> > >>> > >> > > 2. Key Range functions do not provide ordering guarantees by
> > >>> design:
> > >>> > >> > >
> > >>> > >> > > ```ReadOnlyKeyValueStore.java
> > >>> > >> > >     /**
> > >>> > >> > >      * Get an iterator over a given range of keys. This
> > >>> iterator must
> > >>> > >> be
> > >>> > >> > > closed after use.
> > >>> > >> > >      * The returned iterator must be safe from {@link
> > >>> > >> > > java.util.ConcurrentModificationException}s
> > >>> > >> > >      * and must not return null values. No ordering
> guarantees
> > >>> are
> > >>> > >> > > provided.
> > >>> > >> > >      * ...
> > >>> > >> > >      */
> > >>> > >> > >      KeyValueIterator<K, V> range(K from, K to);
> > >>> > >> > > ```
> > >>> > >> > >
> > >>> > >> > > Therefore, I'd propose removing Key range operations from
> the
> > >>> scope.
> > >>> > >> > >
> > >>> > >> > > 3. Timestamp Range operations happen at the SegmentsStore
> level
> > >>> > >> (internal)
> > >>> > >> > > API
> > >>> > >> > >
> > >>> > >> > > AFAICT, Segments wrappers handle all Timestamp ranges
> queries.
> > >>> > >> > >
> > >>> > >> > > I'd propose extending `Segments#segments(timeFrom, timeTo,
> > >>> backwards)`
> > >>> > >> > > with a flag for backwards operations.
> > >>> > >> > >
> > >>> > >> > > As segments returned will be processed backwards, I'm not
> > >>> extending
> > >>> > >> > > KeyValueStores to query each segment backwards as previous
> > >>> point 2.
> > >>> > >> > >
> > >>> > >> > > 4. Extend WindowStores implementations with a new
> > >>> > >> > > WindowBackwardStore/ReadOnlyBackwardStore:
> > >>> > >> > >
> > >>> > >> > > ```java
> > >>> > >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
> > >>> > >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant
> from,
> > >>> Instant
> > >>> > >> to)
> > >>> > >> > > throws IllegalArgumentException;
> > >>> > >> > >
> > >>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from,
> K to,
> > >>> > >> Instant
> > >>> > >> > > fromTime, Instant toTime)
> > >>> > >> > >         throws IllegalArgumentException;
> > >>> > >> > >
> > >>> > >> > >     KeyValueIterator<Windowed<K>, V>
> backwardFetchAll(Instant
> > >>> from,
> > >>> > >> > > Instant to) throws IllegalArgumentException;
> > >>> > >> > > ```
> > >>> > >> > >
> > >>> > >> > > 5. SessionStore is a bit different as it has fetch/find
> sessions
> > >>> > >> spread
> > >>> > >> > > between SessionStore and ReadOnlySessionStore.
> > >>> > >> > >
> > >>> > >> > > I'd propose a new interface `SessionBackwardStore` to expose
> > >>> backward
> > >>> > >> find
> > >>> > >> > > operations:
> > >>> > >> > >
> > >>> > >> > > ```java
> > >>> > >> > > public interface SessionBackwardStore<K, AGG> {
> > >>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
> > >>> backwardFindSessions(final K
> > >>> > >> key,
> > >>> > >> > > final long earliestSessionEndTime, final long
> > >>> latestSessionStartTime);
> > >>> > >> > >
> > >>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
> > >>> backwardFindSessions(final K
> > >>> > >> > > keyFrom, final K keyTo, final long earliestSessionEndTime,
> > >>> final long
> > >>> > >> > > latestSessionStartTime);
> > >>> > >> > > }
> > >>> > >> > > ```
> > >>> > >> > >
> > >>> > >> > > If this understanding is correct I'll proceed to update the
> KIP
> > >>> based
> > >>> > >> on
> > >>> > >> > > this.
> > >>> > >> > >
> > >>> > >> > > Looking forward to your feedback.
> > >>> > >> > >
> > >>> > >> > > Thanks,
> > >>> > >> > > Jorge.
> > >>> > >> > >
> > >>> > >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <
> > >>> mjsax@apache.org>
> > >>> > >> wrote:
> > >>> > >> > >
> > >>> > >> > >> Hey,
> > >>> > >> > >>
> > >>> > >> > >> Sorry that I am late to the game. I am not 100% convinced
> > >>> about the
> > >>> > >> > >> current proposal. Using a new config as feature flag seems
> to
> > >>> be
> > >>> > >> rather
> > >>> > >> > >> "nasty" to me, and flipping from/to is a little bit too
> fancy
> > >>> for my
> > >>> > >> > >> personal taste.
> > >>> > >> > >>
> > >>> > >> > >> I agree, that the original proposal using a "ReadDirection"
> > >>> enum is
> > >>> > >> not
> > >>> > >> > >> ideal either.
> > >>> > >> > >>
> > >>> > >> > >> Thus, I would like to put out a new idea: We could add a
> new
> > >>> > >> interface
> > >>> > >> > >> that offers new methods that return revers iterators.
> > >>> > >> > >>
> > >>> > >> > >> The KIP already proposes to add `reverseAll()` and it seems
> > >>> backward
> > >>> > >> > >> incompatible to just add this method to
> > >>> `ReadOnlyKeyValueStore` and
> > >>> > >> > >> `ReadOnlyWindowStore`. I don't think we could provide a
> useful
> > >>> > >> default
> > >>> > >> > >> implementation for custom stores and thus either break
> > >>> compatibility
> > >>> > >> or
> > >>> > >> > >> need add a default that just throws an exception. Neither
> > >>> seems to
> > >>> > >> be a
> > >>> > >> > >> good option.
> > >>> > >> > >>
> > >>> > >> > >> Using a new interface avoid this issue and allows users
> > >>> implementing
> > >>> > >> > >> custom stores to opt-in by adding the interface to their
> > >>> stores.
> > >>> > >> > >> Furthermore, we don't need any config. In the end, we
> > >>> encapsulte the
> > >>> > >> > >> change into the store, and our runtime is agnostic to it
> (as it
> > >>> > >> should
> > >>> > >> > >> be).
> > >>> > >> > >>
> > >>> > >> > >> The hierarchy becomes a little complex (but uses would not
> > >>> really see
> > >>> > >> > >> the complexity):
> > >>> > >> > >>
> > >>> > >> > >> // exsiting
> > >>> > >> > >> ReadOnlyKeyValueStore
> > >>> > >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> // helper interface; users don't care
> > >>> > >> > >> // need similar ones for other stores
> > >>> > >> > >> ReverseReadOnlyKeyValueStore {
> > >>> > >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
> > >>> > >> > >>     KeyValueIterator<K, V> reverseAll();
> > >>> > >> > >> }
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> // two new user facing interfaces for kv-store
> > >>> > >> > >> // need similar ones for other stores
> > >>> > >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
> > >>> > >> ReadOnlyKeyValueStore,
> > >>> > >> > >> ReverseReadOnlyKeyValueStore
> > >>> > >> > >>
> > >>> > >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
> > >>> > >> > >> ReverseReadOnlyKeyValueStore
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> // updated (also internal)
> > >>> > >> > >> // also need to update other built-in stores
> > >>> > >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
> > >>> > >> BulkLoadingStore
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> In the end, user would only care about the two (for
> kv-case)
> > >>> new
> > >>> > >> > >> interface that offer revers iterator (read only and
> regular)
> > >>> and can
> > >>> > >> > >> cast stores accordingly in their Processors/Transformers or
> > >>> via IQ.
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> Btw: if we add revers iterator for KeyValue and Window
> store,
> > >>> should
> > >>> > >> we
> > >>> > >> > >> do the same for Session store?
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> This might be more code to write, but I believe it
> provides the
> > >>> > >> better
> > >>> > >> > >> user experience. Thoughts?
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> -Matthias
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
> > >>> > >> > >> > Sorry for my silence, Jorge,
> > >>> > >> > >> >
> > >>> > >> > >> > I've just taken another look, and I'm personally happy
> with
> > >>> the
> > >>> > >> KIP.
> > >>> > >> > >> >
> > >>> > >> > >> > Thanks,
> > >>> > >> > >> > -John
> > >>> > >> > >> >
> > >>> > >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate
> Otoya
> > >>> wrote:
> > >>> > >> > >> >> If no additional comments, I will proceed to start the a
> > >>> vote
> > >>> > >> thread.
> > >>> > >> > >> >>
> > >>> > >> > >> >> Thanks a lot for your feedback!
> > >>> > >> > >> >>
> > >>> > >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate
> > >>> Otoya <
> > >>> > >> > >> >> quilcate.jorge@gmail.com> wrote:
> > >>> > >> > >> >>
> > >>> > >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
> > >>> > >> > >> >>>
> > >>> > >> > >> >>> I updated the KIP with your feedback.
> > >>> > >> > >> >>>
> > >>> > >> > >> >>>
> > >>> > >> > >> >>>
> > >>> > >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
> > >>> > >> > >> sophie@confluent.io>
> > >>> > >> > >> >>> wrote:
> > >>> > >> > >> >>>
> > >>> > >> > >> >>>> Hm, the case of `all()` does seem to present a
> dilemma in
> > >>> the
> > >>> > >> case of
> > >>> > >> > >> >>>> variable-length keys.
> > >>> > >> > >> >>>>
> > >>> > >> > >> >>>> In the case of fixed-length keys, you can just compute
> > >>> the keys
> > >>> > >> that
> > >>> > >> > >> >>>> correspond
> > >>> > >> > >> >>>> to the maximum and minimum serialized bytes, then
> perform
> > >>> a
> > >>> > >> `range()`
> > >>> > >> > >> >>>> query
> > >>> > >> > >> >>>> instead of an `all()`. If your keys don't have a
> > >>> well-defined
> > >>> > >> > >> ordering
> > >>> > >> > >> >>>> such
> > >>> > >> > >> >>>> that
> > >>> > >> > >> >>>> you can't determine the MAX_KEY, then you probably
> don't
> > >>> care
> > >>> > >> about
> > >>> > >> > >> the
> > >>> > >> > >> >>>> iterator order anyway.
> > >>> > >> > >> >>>>
> > >>> > >> > >> >>>>  But with variable-length keys, there is no MAX_KEY.
> If
> > >>> all your
> > >>> > >> > >> keys were
> > >>> > >> > >> >>>> just
> > >>> > >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact
> > >>> the only
> > >>> > >> way
> > >>> > >> > >> to
> > >>> > >> > >> >>>> figure out the
> > >>> > >> > >> >>>> maximum key in the store is by using `all()` -- and
> > >>> without a
> > >>> > >> reverse
> > >>> > >> > >> >>>> iterator, you're
> > >>> > >> > >> >>>> doomed to iterate through every single key just to
> answer
> > >>> that
> > >>> > >> simple
> > >>> > >> > >> >>>> question.
> > >>> > >> > >> >>>>
> > >>> > >> > >> >>>> That said, I still think determining the iterator
> order
> > >>> based
> > >>> > >> on the
> > >>> > >> > >> >>>> to/from bytes
> > >>> > >> > >> >>>> makes a lot of intuitive sense and gives the API a
> nice
> > >>> > >> symmetry.
> > >>> > >> > >> What if
> > >>> > >> > >> >>>> we
> > >>> > >> > >> >>>> solved the `all()` problem by just giving `all()` a
> > >>> reverse
> > >>> > >> form to
> > >>> > >> > >> >>>> complement it?
> > >>> > >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or
> something
> > >>> to
> > >>> > >> that
> > >>> > >> > >> effect.
> > >>> > >> > >> >>>>
> > >>> > >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
> > >>> Otoya <
> > >>> > >> > >> >>>> quilcate.jorge@gmail.com> wrote:
> > >>> > >> > >> >>>>
> > >>> > >> > >> >>>>> Thanks John.
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>> Agree. I like the first approach as well, with
> > >>> StreamsConfig
> > >>> > >> flag
> > >>> > >> > >> >>>> passing
> > >>> > >> > >> >>>>> by via ProcessorContext.
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>> Another positive effect with "reverse parameters" is
> > >>> that in
> > >>> > >> the
> > >>> > >> > >> case of
> > >>> > >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can
> > >>> decide
> > >>> > >> _which_
> > >>> > >> > >> pair
> > >>> > >> > >> >>>> to
> > >>> > >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to
> both.
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>> The only issue I've found while reviewing the KIP is
> that
> > >>> > >> `all()`
> > >>> > >> > >> won't
> > >>> > >> > >> >>>> fit
> > >>> > >> > >> >>>>> within this approach.
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>> We could remove it from the KIP and argue that for
> > >>> WindowStore,
> > >>> > >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all
> in
> > >>> reverse
> > >>> > >> > >> order,
> > >>> > >> > >> >>>> and
> > >>> > >> > >> >>>>> for KeyValueStore, no ordering guarantees are
> provided.
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>> If there is consensus with this changes, I will go
> and
> > >>> update
> > >>> > >> the
> > >>> > >> > >> KIP.
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
> > >>> > >> vvcephei@apache.org>
> > >>> > >> > >> >>>> wrote:
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>>> Hi Jorge,
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would
> > >>> protect
> > >>> > >> anyone
> > >>> > >> > >> >>>>>> who may be depending on the current behavior.
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> It seems better to locate the feature flag in the
> > >>> > >> initialization
> > >>> > >> > >> >>>> logic of
> > >>> > >> > >> >>>>>> the store, rather than have a method on the "live"
> > >>> store that
> > >>> > >> > >> changes
> > >>> > >> > >> >>>>>> its behavior on the fly.
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> It seems like there are two options here, one is to
> add
> > >>> a new
> > >>> > >> > >> config:
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
> > >>> > >> > >> >>>>>>   "enable.backwards.iteration
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> Or we can add a feature flag in Materialized, like
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> Materialized.enableBackwardsIteration()
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> I think I'd personally lean toward the config, for
> the
> > >>> > >> following
> > >>> > >> > >> >>>> reason.
> > >>> > >> > >> >>>>>> The concern that Sophie raised is that someone's
> > >>> program may
> > >>> > >> depend
> > >>> > >> > >> >>>>>> on the existing contract of getting an empty
> iterator.
> > >>> We
> > >>> > >> don't
> > >>> > >> > >> want
> > >>> > >> > >> >>>> to
> > >>> > >> > >> >>>>>> switch behavior when they aren't expecting it, so we
> > >>> provide
> > >>> > >> them a
> > >>> > >> > >> >>>>>> config to assert that they _are_ expecting the new
> > >>> behavior,
> > >>> > >> which
> > >>> > >> > >> >>>>>> means they take responsibility for updating their
> code
> > >>> to
> > >>> > >> expect
> > >>> > >> > >> the
> > >>> > >> > >> >>>> new
> > >>> > >> > >> >>>>>> behavior.
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> There doesn't seem to be a reason to offer a choice
> of
> > >>> > >> behaviors
> > >>> > >> > >> on a
> > >>> > >> > >> >>>>>> per-query, or per-store basis. We just want people
> to
> > >>> be not
> > >>> > >> > >> surprised
> > >>> > >> > >> >>>>>> by this change in general.
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> What do you think?
> > >>> > >> > >> >>>>>> Thanks,
> > >>> > >> > >> >>>>>> -John
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate
> wrote:
> > >>> > >> > >> >>>>>>> Thank you both for the great feedback.
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes
> the
> > >>> need
> > >>> > >> for
> > >>> > >> > >> >>>>>>> additional API methods. And with a feature flag on
> > >>> > >> `StateStore`,
> > >>> > >> > >> >>>>>>> disabled by default, should no break current users.
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>> The only side-effect I can think of is that: by
> moving
> > >>> the
> > >>> > >> flag
> > >>> > >> > >> >>>>> upwards,
> > >>> > >> > >> >>>>>>> all later operations become affected; which might
> be
> > >>> ok for
> > >>> > >> most
> > >>> > >> > >> >>>> (all?)
> > >>> > >> > >> >>>>>>> cases. I can't think of an scenario where this
> would
> > >>> be an
> > >>> > >> issue,
> > >>> > >> > >> >>>> just
> > >>> > >> > >> >>>>>>> want to point this out.
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>> If moving to this approach, I'd like to check if I
> got
> > >>> this
> > >>> > >> right
> > >>> > >> > >> >>>>> before
> > >>> > >> > >> >>>>>>> updating the KIP:
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>> - only `StateStore` will change by having a new
> method:
> > >>> > >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep
> > >>> things
> > >>> > >> > >> compatible.
> > >>> > >> > >> >>>>>>> - then all `*Stores` will have to update their
> > >>> implementation
> > >>> > >> > >> based
> > >>> > >> > >> >>>> on
> > >>> > >> > >> >>>>>>> this flag.
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> > >>> > >> > >> >>>>>>>>> There's no possibility that someone could be
> relying
> > >>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> > >>> because
> > >>> > >> that's
> > >>> > >> > >> >>>> not
> > >>> > >> > >> >>>>>> what
> > >>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> > >>> getting
> > >>> > >> an
> > >>> > >> > >> >>>> empty
> > >>> > >> > >> >>>>>>>> iterator
> > >>> > >> > >> >>>>>>>>
> > >>> > >> > >> >>>>>>>> I just meant that they might be relying on the
> > >>> assumption
> > >>> > >> that
> > >>> > >> > >> the
> > >>> > >> > >> >>>>>> range
> > >>> > >> > >> >>>>>>>> query
> > >>> > >> > >> >>>>>>>> will never return results with decreasing keys.
> The
> > >>> empty
> > >>> > >> > >> iterator
> > >>> > >> > >> >>>>>> wouldn't
> > >>> > >> > >> >>>>>>>> break that contract, but of course a surprise
> reverse
> > >>> > >> iterator
> > >>> > >> > >> >>>> would.
> > >>> > >> > >> >>>>>>>>
> > >>> > >> > >> >>>>>>>> FWIW I actually am in favor of automatically
> > >>> converting to a
> > >>> > >> > >> >>>> reverse
> > >>> > >> > >> >>>>>>>> iterator,
> > >>> > >> > >> >>>>>>>> I just thought we should consider whether this
> should
> > >>> be
> > >>> > >> off by
> > >>> > >> > >> >>>>>> default or
> > >>> > >> > >> >>>>>>>> even possible to disable at all.
> > >>> > >> > >> >>>>>>>>
> > >>> > >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
> > >>> > >> > >> vvcephei@apache.org
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>>> wrote:
> > >>> > >> > >> >>>>>>>>
> > >>> > >> > >> >>>>>>>>> Thanks for the response, Sophie,
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>>>> I wholeheartedly agree we should take as much
> into
> > >>> account
> > >>> > >> as
> > >>> > >> > >> >>>>> possible
> > >>> > >> > >> >>>>>>>>> up front, rather than regretting our decisions
> > >>> later. I
> > >>> > >> actually
> > >>> > >> > >> >>>> do
> > >>> > >> > >> >>>>>> share
> > >>> > >> > >> >>>>>>>>> your vague sense of worry, which was what led me
> to
> > >>> say
> > >>> > >> > >> initially
> > >>> > >> > >> >>>>>> that I
> > >>> > >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
> > >>> > >> Sometimes, it's
> > >>> > >> > >> >>>>>> better
> > >>> > >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think
> more
> > >>> > >> people
> > >>> > >> > >> >>>> will be
> > >>> > >> > >> >>>>>>>>> confused
> > >>> > >> > >> >>>>>>>>> than not.
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>>>> I really don't think that there's any danger of
> > >>> "relying
> > >>> > >> on a
> > >>> > >> > >> >>>> bug"
> > >>> > >> > >> >>>>>> here,
> > >>> > >> > >> >>>>>>>>> although
> > >>> > >> > >> >>>>>>>>> people certainly could be relying on current
> > >>> behavior. One
> > >>> > >> thing
> > >>> > >> > >> >>>> to
> > >>> > >> > >> >>>>> be
> > >>> > >> > >> >>>>>>>>> clear
> > >>> > >> > >> >>>>>>>>> about (which I just left a more detailed comment
> in
> > >>> > >> KAFKA-8159
> > >>> > >> > >> >>>>> about)
> > >>> > >> > >> >>>>>> is
> > >>> > >> > >> >>>>>>>>> that
> > >>> > >> > >> >>>>>>>>> when we say something like key1 > key2, this
> > >>> ordering is
> > >>> > >> defined
> > >>> > >> > >> >>>> by
> > >>> > >> > >> >>>>>> the
> > >>> > >> > >> >>>>>>>>> serde's output and nothing else.
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>>>> Currently, thanks to your fix in
> > >>> > >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
> > >>> > >> > >> >>>>>>>>> ,
> > >>> > >> > >> >>>>>>>>> the store contract is that for range scans, if
> from
> > >>> > to,
> > >>> > >> then
> > >>> > >> > >> >>>> the
> > >>> > >> > >> >>>>>> store
> > >>> > >> > >> >>>>>>>>> must
> > >>> > >> > >> >>>>>>>>> return an empty iterator. There's no possibility
> that
> > >>> > >> someone
> > >>> > >> > >> >>>> could
> > >>> > >> > >> >>>>> be
> > >>> > >> > >> >>>>>>>>> relying
> > >>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> > >>> because
> > >>> > >> that's
> > >>> > >> > >> >>>> not
> > >>> > >> > >> >>>>>> what
> > >>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> > >>> getting
> > >>> > >> an
> > >>> > >> > >> >>>> empty
> > >>> > >> > >> >>>>>>>>> iterator.
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>>>> My counterproposal was to actually change this
> > >>> contract to
> > >>> > >> say
> > >>> > >> > >> >>>> that
> > >>> > >> > >> >>>>>> the
> > >>> > >> > >> >>>>>>>>> store
> > >>> > >> > >> >>>>>>>>> must return an iterator over the keys in that
> range,
> > >>> but
> > >>> > >> in the
> > >>> > >> > >> >>>>>> reverse
> > >>> > >> > >> >>>>>>>>> order.
> > >>> > >> > >> >>>>>>>>> So, in addition to considering whether this idea
> is
> > >>> "too
> > >>> > >> fancy"
> > >>> > >> > >> >>>> (aka
> > >>> > >> > >> >>>>>>>>> confusing),
> > >>> > >> > >> >>>>>>>>> we should also consider the likelihood of
> breaking an
> > >>> > >> existing
> > >>> > >> > >> >>>>>> program with
> > >>> > >> > >> >>>>>>>>> this behavior/contract change.
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>>>> To echo your clarification, I'm also not
> advocating
> > >>> > >> strongly in
> > >>> > >> > >> >>>>> favor
> > >>> > >> > >> >>>>>> of my
> > >>> > >> > >> >>>>>>>>> proposal. I just wanted to present it for
> > >>> consideration
> > >>> > >> > >> alongside
> > >>> > >> > >> >>>>>> Jorge's
> > >>> > >> > >> >>>>>>>>> original one.
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>>>> Thanks for raising these very good points,
> > >>> > >> > >> >>>>>>>>> -John
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie
> Blee-Goldman
> > >>> wrote:
> > >>> > >> > >> >>>>>>>>>>> Rather than working around it, I think we
> should
> > >>> just
> > >>> > >> fix it
> > >>> > >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
> > >>> > >> > >> >>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>> That was my primary concern, although I do have
> a
> > >>> vague
> > >>> > >> sense
> > >>> > >> > >> of
> > >>> > >> > >> >>>>>> worry
> > >>> > >> > >> >>>>>>>>>> that we might be allowing users to get into
> trouble
> > >>> > >> without
> > >>> > >> > >> >>>>>> realizing it.
> > >>> > >> > >> >>>>>>>>>> For example if their custom serdes suffer a
> similar
> > >>> bug
> > >>> > >> as the
> > >>> > >> > >> >>>>> above,
> > >>> > >> > >> >>>>>>>>>> and/or
> > >>> > >> > >> >>>>>>>>>> they rely on getting results in increasing order
> > >>> (of the
> > >>> > >> keys)
> > >>> > >> > >> >>>> even
> > >>> > >> > >> >>>>>> when
> > >>> > >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact
> that
> > >>> the
> > >>> > >> range
> > >>> > >> > >> >>>> query
> > >>> > >> > >> >>>>>> returns
> > >>> > >> > >> >>>>>>>>>> nothing in that case.
> > >>> > >> > >> >>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug
> or
> > >>> not,
> > >>> > >> but in
> > >>> > >> > >> >>>> that
> > >>> > >> > >> >>>>>> past
> > >>> > >> > >> >>>>>>>>>> we've
> > >>> > >> > >> >>>>>>>>>> taken the stance that we should not break
> > >>> compatibility
> > >>> > >> even if
> > >>> > >> > >> >>>> the
> > >>> > >> > >> >>>>>> user
> > >>> > >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
> > >>> > >> > >> >>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly
> against
> > >>> this
> > >>> > >> > >> >>>> proposal,
> > >>> > >> > >> >>>>>> just
> > >>> > >> > >> >>>>>>>>>> laying
> > >>> > >> > >> >>>>>>>>>> out some considerations we should take into
> > >>> account. At
> > >>> > >> the end
> > >>> > >> > >> >>>> of
> > >>> > >> > >> >>>>>> the
> > >>> > >> > >> >>>>>>>>> day
> > >>> > >> > >> >>>>>>>>>> we should do what's right rather than maintain
> > >>> > >> compatibility
> > >>> > >> > >> >>>> with
> > >>> > >> > >> >>>>>>>>> existing
> > >>> > >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle
> > >>> ground.
> > >>> > >> > >> >>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
> > >>> > >> > >> >>>> vvcephei@apache.org>
> > >>> > >> > >> >>>>>>>>> wrote:
> > >>> > >> > >> >>>>>>>>>>> Thanks Sophie,
> > >>> > >> > >> >>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working
> > >>> around it,
> > >>> > >> I
> > >>> > >> > >> >>>> think
> > >>> > >> > >> >>>>> we
> > >>> > >> > >> >>>>>>>>> should
> > >>> > >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the
> Jira.
> > >>> > >> > >> >>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s
> > >>> concern
> > >>> > >> that some
> > >>> > >> > >> >>>>>> serdes
> > >>> > >> > >> >>>>>>>>>>> might be incorrectly written.
> > >>> > >> > >> >>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>> Were there other practical concerns that you
> had
> > >>> in mind?
> > >>> > >> > >> >>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>> Thanks,
> > >>> > >> > >> >>>>>>>>>>> John
> > >>> > >> > >> >>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie
> Blee-Goldman
> > >>> > >> wrote:
> > >>> > >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the
> to/from
> > >>> bytes
> > >>> > >> but I
> > >>> > >> > >> >>>>> think
> > >>> > >> > >> >>>>>>>>> there
> > >>> > >> > >> >>>>>>>>>>>> are some practical limitations to implementing
> > >>> this. In
> > >>> > >> > >> >>>>> particular
> > >>> > >> > >> >>>>>>>>>>>> I'm thinking about this issue
> > >>> > >> > >> >>>>>>>>>>>> <
> https://issues.apache.org/jira/browse/KAFKA-8159
> > >>> >
> > >>> > >> with the
> > >>> > >> > >> >>>>>> built-in
> > >>> > >> > >> >>>>>>>>>>> signed
> > >>> > >> > >> >>>>>>>>>>>> number serdes.
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
> > >>> > >> > >> >>>> negative-negative
> > >>> > >> > >> >>>>>>>>> queries
> > >>> > >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would
> cause
> > >>> > >> > >> undetectable
> > >>> > >> > >> >>>>>>>>>>>> incorrect results for negative-positive
> queries.
> > >>> For
> > >>> > >> example,
> > >>> > >> > >> >>>> say
> > >>> > >> > >> >>>>>> you
> > >>> > >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using
> the
> > >>> Short
> > >>> > >> > >> serdes.
> > >>> > >> > >> >>>>> The
> > >>> > >> > >> >>>>>>>>>>>> serialized bytes for that are
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> from = 1111111111111111
> > >>> > >> > >> >>>>>>>>>>>> to = 0000000000000001
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> so we would end up flipping those and
> iterating
> > >>> over
> > >>> > >> all keys
> > >>> > >> > >> >>>>> from
> > >>> > >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111.
> Iterating in
> > >>> > >> > >> >>>>> lexicographical
> > >>> > >> > >> >>>>>>>>>>>> order means we would iterate over every key in
> > >>> the space
> > >>> > >> > >> >>>> *except*
> > >>> > >> > >> >>>>>> for
> > >>> > >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we
> > >>> meant to be
> > >>> > >> > >> >>>> included
> > >>> > >> > >> >>>>>> in
> > >>> > >> > >> >>>>>>>>> the
> > >>> > >> > >> >>>>>>>>>>>> range query.
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> Currently we just log a warning and return an
> > >>> empty
> > >>> > >> iterator
> > >>> > >> > >> >>>> when
> > >>> > >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect
> but
> > >>> feels
> > >>> > >> > >> >>>> slightly
> > >>> > >> > >> >>>>>> more
> > >>> > >> > >> >>>>>>>>>>>> palatable. If we start automatically
> converting to
> > >>> > >> reverse
> > >>> > >> > >> >>>>> queries
> > >>> > >> > >> >>>>>> we
> > >>> > >> > >> >>>>>>>>>>>> can't even log a warning in this case unless
> we
> > >>> wanted
> > >>> > >> to log
> > >>> > >> > >> >>>> a
> > >>> > >> > >> >>>>>>>>> warning
> > >>> > >> > >> >>>>>>>>>>>> every time, which would be weird to do for a
> valid
> > >>> > >> usage of a
> > >>> > >> > >> >>>> new
> > >>> > >> > >> >>>>>>>>>>>> feature.
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> All that said, I still like the idea overall.
> Off
> > >>> the
> > >>> > >> top of
> > >>> > >> > >> >>>> my
> > >>> > >> > >> >>>>>> head
> > >>> > >> > >> >>>>>>>>> I
> > >>> > >> > >> >>>>>>>>>>> guess
> > >>> > >> > >> >>>>>>>>>>>> we could add a store config to enable/disable
> > >>> automatic
> > >>> > >> > >> >>>> reverse
> > >>> > >> > >> >>>>>>>>>>> iteration,
> > >>> > >> > >> >>>>>>>>>>>> which is off by default?
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice
> addition
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> Sophie
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
> > >>> > >> > >> >>>>> vvcephei@apache.org>
> > >>> > >> > >> >>>>>>>>>>> wrote:
> > >>> > >> > >> >>>>>>>>>>>>> Hi there Jorge,
> > >>> > >> > >> >>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>> Thanks for the KIP!
> > >>> > >> > >> >>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
> > >>> > >> > >> >>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but
> > >>> what do
> > >>> > >> you
> > >>> > >> > >> >>>> think
> > >>> > >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing
> > >>> people to
> > >>> > >> flip
> > >>> > >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading
> > >>> from "A"
> > >>> > >> to "Z"
> > >>> > >> > >> >>>>> would
> > >>> > >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would
> be a
> > >>> > >> backward
> > >>> > >> > >> >>>> one?
> > >>> > >> > >> >>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>> Thanks,
> > >>> > >> > >> >>>>>>>>>>>>> -John
> > >>> > >> > >> >>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge
> Quilcate
> > >>> wrote:
> > >>> > >> > >> >>>>>>>>>>>>>> Hi everyone,
> > >>> > >> > >> >>>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>>> I would like to start the discussion for
> > >>> KIP-617:
> > >>> > >> > >> >>>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>
> > >>> > >> > >>
> > >>> > >>
> > >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> > >>> > >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
> > >>> > >> > >> >>>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>>> Thanks!
> > >>> > >> > >> >>>>>>>>>>>>>> Jorge.
> > >>> > >> > >> >>>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>>>
> > >>> > >> > >> >>>>>>>>>>>>>> Attachments:
> > >>> > >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>>
> > >>> > >> > >> >>>>>>> Attachments:
> > >>> > >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
> > >>> > >> > >> >>>>>>
> > >>> > >> > >> >>>>>
> > >>> > >> > >> >>>>
> > >>> > >> > >> >>>
> > >>> > >> > >> >>
> > >>> > >> > >>
> > >>> > >> > >>
> > >>> > >> >
> > >>> > >>
> > >>> > >
> > >>> >
> > >>>
> > >>
> >
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by John Roesler <vv...@apache.org>.
Hey Jorge,

Thanks for the details. That sounds like a mistake to me on both counts.

I don’t think you need to worry about those depreciations. If the interface methods aren’t deprecated, then the methods are not deprecated. We should remove the annotations, but it doesn’t need to be in the kip.

I think any query methods should have been in the ReadOnly interface. I guess it’s up to you whether you want to:
1. Add the reverse methods next to the existing methods (what you have in the kip right now)
2. Partially fix it by adding your new methods to the ReadOnly interface 
3. Fully fix the problem by moving the existing methods as well as your new ones. Since  SessionStore extends ReadOnlySessionStore, it’s ok just to move the definitions. 

I’m ok with whatever you prefer. 

Thanks,
John

On Thu, Jul 2, 2020, at 11:29, Jorge Esteban Quilcate Otoya wrote:
> (sorry for the spam)
> 
> Actually `findSessions` are only deprecated on `InMemorySessionStore`,
> which seems strange as RocksDB and interfaces haven't marked these methods
> as deprecated.
> 
> Any hint on how to handle this?
> 
> Cheers,
> 
> On Thu, Jul 2, 2020 at 4:57 PM Jorge Esteban Quilcate Otoya <
> quilcate.jorge@gmail.com> wrote:
> 
> > @John: I can see there are some deprecations in there as well. Will update
> > the KIP.
> >
> > Thanks,
> > Jorge
> >
> >
> > On Thu, Jul 2, 2020 at 3:29 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jorge@gmail.com> wrote:
> >
> >> Thanks John.
> >>
> >> > it looks like there’s a revision error in which two methods are
> >> proposed for SessionStore, but seem like they should be in
> >> ReadOnlySessionStore. Do I read that right?
> >>
> >> Yes, I've opted to keep the new methods alongside the existing ones. In
> >> the case of SessionStore, `findSessions` are in `SessionStore`, and `fetch`
> >> are in `ReadOnlySessionStore`. If it makes more sense, I can move all of
> >> them to ReadOnlySessionStore.
> >> Let me know what you think.
> >>
> >> Thanks,
> >> Jorge.
> >>
> >> On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org> wrote:
> >>
> >>> Hi Jorge,
> >>>
> >>> Thanks for the update. I think this is a good plan.
> >>>
> >>> I just took a look at the KIP again, and it looks like there’s a
> >>> revision error in which two methods are proposed for SessionStore, but seem
> >>> like they should be in ReadOnlySessionStore. Do I read that right?
> >>>
> >>> Otherwise, I’m happy with your proposal.
> >>>
> >>> Thanks,
> >>> John
> >>>
> >>> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
> >>> > Quick update: KIP is updated with latest changes now.
> >>> > Will leave it open this week while working on the PR.
> >>> >
> >>> > Hope to open a new vote thread over the next few days if no additional
> >>> > feedback is provided.
> >>> >
> >>> > Cheers,
> >>> > Jorge.
> >>> >
> >>> > On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
> >>> > quilcate.jorge@gmail.com> wrote:
> >>> >
> >>> > > Thanks, John!
> >>> > >
> >>> > > Make sense to reconsider the current approach. I was heading in a
> >>> similar
> >>> > > direction while drafting the implementation. Metered, Caching, and
> >>> other
> >>> > > layers will also have to get duplicated to build up new methods in
> >>> `Stores`
> >>> > > factory, and class casting issues would appear on stores created by
> >>> DSL.
> >>> > >
> >>> > > I will draft a proposal with new methods (move methods from proposed
> >>> > > interfaces to existing ones) with default implementation in a KIP
> >>> update
> >>> > > and wait for Matthias to chime in to validate this approach.
> >>> > >
> >>> > > Jorge.
> >>> > >
> >>> > >
> >>> > > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <vv...@apache.org>
> >>> wrote:
> >>> > >
> >>> > >> Hi Jorge,
> >>> > >>
> >>> > >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
> >>> releases.
> >>> > >>
> >>> > >> The idea to separate the new methods into "mixin" interfaces seems
> >>> > >> like a good one, but as we've discovered in KIP-614, it doesn't work
> >>> > >> out that way in practice. The problem is that the store
> >>> implementations
> >>> > >> are just the base layer that get composed with other layers in
> >>> Streams
> >>> > >> before they can be accessed in the DSL. This is extremely subtle, so
> >>> > >> I'm going to put everyone to sleep with a detailed explanation:
> >>> > >>
> >>> > >> For example, this is the mechanism by which all KeyValueStore
> >>> > >> implementations get added to Streams:
> >>> > >> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
> >>> > >> return new MeteredKeyValueStore<>(
> >>> > >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
> >>> > >>   storeSupplier.metricsScope(),
> >>> > >>   time,
> >>> > >>   keySerde,
> >>> > >>   valueSerde
> >>> > >> );
> >>> > >>
> >>> > >> In the DSL, the store that a processor gets from the context would
> >>> be
> >>> > >> the result of this composition. So even if the storeSupplier.get()
> >>> returns
> >>> > >> a store that implements the "reverse" interface, when you try to
> >>> use it
> >>> > >> from a processor like:
> >>> > >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> >>> > >> ReadOnlyBackwardWindowStore<K, V> store =
> >>> > >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
> >>> > >>
> >>> > >> You'd just get a ClassCastException because it's actually a
> >>> > >> MeteredKeyValueStore, which doesn't implement
> >>> > >> ReadOnlyBackwardWindowStore.
> >>> > >>
> >>> > >> The only way to make this work would be to make the Metered,
> >>> > >> Caching, and Logging layers also implement the new interfaces,
> >>> > >> but this effectively forces implementations to also implement
> >>> > >> the interface. Otherwise, the intermediate layers would have to
> >>> > >> cast the store in each method, like this:
> >>> > >> MeteredWindowStore#backwardFetch {
> >>> > >>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
> >>> > >> }
> >>> > >>
> >>> > >> And then if the implementation doesn't "opt in" by implementing
> >>> > >> the interface, you'd get a ClassCastException, not when you get the
> >>> > >> store, but when you try to use it.
> >>> > >>
> >>> > >> The fact that we get ClassCastExceptions no matter which way we
> >>> > >> turn here indicates that we're really not getting any benefit from
> >>> the
> >>> > >> type system, which makes the extra interfaces seem not worth all the
> >>> > >> code involved.
> >>> > >>
> >>> > >> Where we landed in KIP-614 is that, unless we want to completely
> >>> > >> revamp the way that StateStores work in the DSL, you might as
> >>> > >> well just add the new methods to the existing interfaces. To prevent
> >>> > >> compilation errors, we can add default implementations that throw
> >>> > >> UnsupportedOperationException. If a store doesn't opt in by
> >>> > >> implementing the methods, you'd get an
> >>> UnsupportedOperationException,
> >>> > >> which seems no worse, and maybe better, than the ClassCastException
> >>> > >> you'd get if we go with the "mixin interface" approach.
> >>> > >>
> >>> > >> A quick note: This entire discussion focuses on the DSL. If you're
> >>> just
> >>> > >> using the Processor API by directly adding the a custom store to the
> >>> > >> Topology:
> >>> > >> org.apache.kafka.streams.Topology#addStateStore
> >>> > >> and then retrieving it in the processor via:
> >>> > >> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
> >>> > >> in
> >>> > >> org.apache.kafka.streams.processor.Processor#init
> >>> > >>
> >>> > >> Then, you can both register and retrieve _any_ StateStore
> >>> implementation.
> >>> > >> There's no need to use KeyValueStore or any other built-in
> >>> interface.
> >>> > >> In other words, KeyValueStore and company are only part of the DSL,
> >>> > >> not the PAPI. So, discussions about the build-in store interfaces
> >>> are only
> >>> > >> really relevant in the context of the DSL, Transformers, and
> >>> Materialized.
> >>> > >>
> >>> > >> So, in conclusion, I'd really recommend just adding any new methods
> >>> to
> >>> > >> the existing store interfaces. We might be able to revamp the API
> >>> in the
> >>> > >> future to support mixins, but it's a much larger scope of work than
> >>> this
> >>> > >> KIP.
> >>> > >> A more minor comment is that we don't need to add Deprecated
> >>> variants
> >>> > >> of new methods.
> >>> > >>
> >>> > >> Thanks again, and once again, I'm sorry I tuned out and didn't
> >>> offer this
> >>> > >> feedback before you revised the KIP.
> >>> > >> -John
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >>
> >>> > >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
> >>> > >> > Hi everyone,
> >>> > >> >
> >>> > >> > I've updated the KIP, applying Matthias' feedback regarding
> >>> interface
> >>> > >> > hierarchy.
> >>> > >> >
> >>> > >> > Also, following the last email, I think we can consider reverse
> >>> > >> operations
> >>> > >> > on KeyValue range as well, as implementation supports
> >>> lexicographic
> >>> > >> order.
> >>> > >> >
> >>> > >> > I considered different naming between Key-based ranges and
> >>> Time-based
> >>> > >> > ranges, and mitigate confusion when fetching keys and time ranges
> >>> as
> >>> > >> > WindowStore does:
> >>> > >> >
> >>> > >> > Key-based ranges: reverseRange(), reverseAll()
> >>> > >> > Time-based ranges: backwardFetch()
> >>> > >> >
> >>> > >> > Then, key-based changes apply to KeyValueStore, and time-based
> >>> changes
> >>> > >> to
> >>> > >> > Window and Session stores.
> >>> > >> >
> >>> > >> > Let me know if you have any questions.
> >>> > >> >
> >>> > >> > Thanks,
> >>> > >> > Jorge.
> >>> > >> >
> >>> > >> >
> >>> > >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
> >>> > >> > quilcate.jorge@gmail.com> wrote:
> >>> > >> >
> >>> > >> > > Hi everyone, sorry for the late reply.
> >>> > >> > >
> >>> > >> > > Thanks Matthias for your feedback. I think it makes sense to
> >>> > >> reconsider
> >>> > >> > > the current design based on your input.
> >>> > >> > >
> >>> > >> > > After digging deeper into the current implementation, I'd like
> >>> to
> >>> > >> bring my
> >>> > >> > > current understanding to be double-checked as it might be
> >>> redefining
> >>> > >> the
> >>> > >> > > KIP's scope:
> >>> > >> > >
> >>> > >> > > 1. There are 2 ranges been exposed by different stores:
> >>> > >> > >
> >>> > >> > > a. Key Range
> >>> > >> > > b. Timestamp Range
> >>> > >> > >
> >>> > >> > > So far, we have discussed covering both.
> >>> > >> > >
> >>> > >> > > 2. Key Range functions do not provide ordering guarantees by
> >>> design:
> >>> > >> > >
> >>> > >> > > ```ReadOnlyKeyValueStore.java
> >>> > >> > >     /**
> >>> > >> > >      * Get an iterator over a given range of keys. This
> >>> iterator must
> >>> > >> be
> >>> > >> > > closed after use.
> >>> > >> > >      * The returned iterator must be safe from {@link
> >>> > >> > > java.util.ConcurrentModificationException}s
> >>> > >> > >      * and must not return null values. No ordering guarantees
> >>> are
> >>> > >> > > provided.
> >>> > >> > >      * ...
> >>> > >> > >      */
> >>> > >> > >      KeyValueIterator<K, V> range(K from, K to);
> >>> > >> > > ```
> >>> > >> > >
> >>> > >> > > Therefore, I'd propose removing Key range operations from the
> >>> scope.
> >>> > >> > >
> >>> > >> > > 3. Timestamp Range operations happen at the SegmentsStore level
> >>> > >> (internal)
> >>> > >> > > API
> >>> > >> > >
> >>> > >> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
> >>> > >> > >
> >>> > >> > > I'd propose extending `Segments#segments(timeFrom, timeTo,
> >>> backwards)`
> >>> > >> > > with a flag for backwards operations.
> >>> > >> > >
> >>> > >> > > As segments returned will be processed backwards, I'm not
> >>> extending
> >>> > >> > > KeyValueStores to query each segment backwards as previous
> >>> point 2.
> >>> > >> > >
> >>> > >> > > 4. Extend WindowStores implementations with a new
> >>> > >> > > WindowBackwardStore/ReadOnlyBackwardStore:
> >>> > >> > >
> >>> > >> > > ```java
> >>> > >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
> >>> > >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from,
> >>> Instant
> >>> > >> to)
> >>> > >> > > throws IllegalArgumentException;
> >>> > >> > >
> >>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
> >>> > >> Instant
> >>> > >> > > fromTime, Instant toTime)
> >>> > >> > >         throws IllegalArgumentException;
> >>> > >> > >
> >>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant
> >>> from,
> >>> > >> > > Instant to) throws IllegalArgumentException;
> >>> > >> > > ```
> >>> > >> > >
> >>> > >> > > 5. SessionStore is a bit different as it has fetch/find sessions
> >>> > >> spread
> >>> > >> > > between SessionStore and ReadOnlySessionStore.
> >>> > >> > >
> >>> > >> > > I'd propose a new interface `SessionBackwardStore` to expose
> >>> backward
> >>> > >> find
> >>> > >> > > operations:
> >>> > >> > >
> >>> > >> > > ```java
> >>> > >> > > public interface SessionBackwardStore<K, AGG> {
> >>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
> >>> backwardFindSessions(final K
> >>> > >> key,
> >>> > >> > > final long earliestSessionEndTime, final long
> >>> latestSessionStartTime);
> >>> > >> > >
> >>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
> >>> backwardFindSessions(final K
> >>> > >> > > keyFrom, final K keyTo, final long earliestSessionEndTime,
> >>> final long
> >>> > >> > > latestSessionStartTime);
> >>> > >> > > }
> >>> > >> > > ```
> >>> > >> > >
> >>> > >> > > If this understanding is correct I'll proceed to update the KIP
> >>> based
> >>> > >> on
> >>> > >> > > this.
> >>> > >> > >
> >>> > >> > > Looking forward to your feedback.
> >>> > >> > >
> >>> > >> > > Thanks,
> >>> > >> > > Jorge.
> >>> > >> > >
> >>> > >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <
> >>> mjsax@apache.org>
> >>> > >> wrote:
> >>> > >> > >
> >>> > >> > >> Hey,
> >>> > >> > >>
> >>> > >> > >> Sorry that I am late to the game. I am not 100% convinced
> >>> about the
> >>> > >> > >> current proposal. Using a new config as feature flag seems to
> >>> be
> >>> > >> rather
> >>> > >> > >> "nasty" to me, and flipping from/to is a little bit too fancy
> >>> for my
> >>> > >> > >> personal taste.
> >>> > >> > >>
> >>> > >> > >> I agree, that the original proposal using a "ReadDirection"
> >>> enum is
> >>> > >> not
> >>> > >> > >> ideal either.
> >>> > >> > >>
> >>> > >> > >> Thus, I would like to put out a new idea: We could add a new
> >>> > >> interface
> >>> > >> > >> that offers new methods that return revers iterators.
> >>> > >> > >>
> >>> > >> > >> The KIP already proposes to add `reverseAll()` and it seems
> >>> backward
> >>> > >> > >> incompatible to just add this method to
> >>> `ReadOnlyKeyValueStore` and
> >>> > >> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
> >>> > >> default
> >>> > >> > >> implementation for custom stores and thus either break
> >>> compatibility
> >>> > >> or
> >>> > >> > >> need add a default that just throws an exception. Neither
> >>> seems to
> >>> > >> be a
> >>> > >> > >> good option.
> >>> > >> > >>
> >>> > >> > >> Using a new interface avoid this issue and allows users
> >>> implementing
> >>> > >> > >> custom stores to opt-in by adding the interface to their
> >>> stores.
> >>> > >> > >> Furthermore, we don't need any config. In the end, we
> >>> encapsulte the
> >>> > >> > >> change into the store, and our runtime is agnostic to it (as it
> >>> > >> should
> >>> > >> > >> be).
> >>> > >> > >>
> >>> > >> > >> The hierarchy becomes a little complex (but uses would not
> >>> really see
> >>> > >> > >> the complexity):
> >>> > >> > >>
> >>> > >> > >> // exsiting
> >>> > >> > >> ReadOnlyKeyValueStore
> >>> > >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> // helper interface; users don't care
> >>> > >> > >> // need similar ones for other stores
> >>> > >> > >> ReverseReadOnlyKeyValueStore {
> >>> > >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
> >>> > >> > >>     KeyValueIterator<K, V> reverseAll();
> >>> > >> > >> }
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> // two new user facing interfaces for kv-store
> >>> > >> > >> // need similar ones for other stores
> >>> > >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
> >>> > >> ReadOnlyKeyValueStore,
> >>> > >> > >> ReverseReadOnlyKeyValueStore
> >>> > >> > >>
> >>> > >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
> >>> > >> > >> ReverseReadOnlyKeyValueStore
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> // updated (also internal)
> >>> > >> > >> // also need to update other built-in stores
> >>> > >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
> >>> > >> BulkLoadingStore
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> In the end, user would only care about the two (for kv-case)
> >>> new
> >>> > >> > >> interface that offer revers iterator (read only and regular)
> >>> and can
> >>> > >> > >> cast stores accordingly in their Processors/Transformers or
> >>> via IQ.
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> Btw: if we add revers iterator for KeyValue and Window store,
> >>> should
> >>> > >> we
> >>> > >> > >> do the same for Session store?
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> This might be more code to write, but I believe it provides the
> >>> > >> better
> >>> > >> > >> user experience. Thoughts?
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> -Matthias
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
> >>> > >> > >> > Sorry for my silence, Jorge,
> >>> > >> > >> >
> >>> > >> > >> > I've just taken another look, and I'm personally happy with
> >>> the
> >>> > >> KIP.
> >>> > >> > >> >
> >>> > >> > >> > Thanks,
> >>> > >> > >> > -John
> >>> > >> > >> >
> >>> > >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya
> >>> wrote:
> >>> > >> > >> >> If no additional comments, I will proceed to start the a
> >>> vote
> >>> > >> thread.
> >>> > >> > >> >>
> >>> > >> > >> >> Thanks a lot for your feedback!
> >>> > >> > >> >>
> >>> > >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate
> >>> Otoya <
> >>> > >> > >> >> quilcate.jorge@gmail.com> wrote:
> >>> > >> > >> >>
> >>> > >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
> >>> > >> > >> >>>
> >>> > >> > >> >>> I updated the KIP with your feedback.
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>>
> >>> > >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
> >>> > >> > >> sophie@confluent.io>
> >>> > >> > >> >>> wrote:
> >>> > >> > >> >>>
> >>> > >> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in
> >>> the
> >>> > >> case of
> >>> > >> > >> >>>> variable-length keys.
> >>> > >> > >> >>>>
> >>> > >> > >> >>>> In the case of fixed-length keys, you can just compute
> >>> the keys
> >>> > >> that
> >>> > >> > >> >>>> correspond
> >>> > >> > >> >>>> to the maximum and minimum serialized bytes, then perform
> >>> a
> >>> > >> `range()`
> >>> > >> > >> >>>> query
> >>> > >> > >> >>>> instead of an `all()`. If your keys don't have a
> >>> well-defined
> >>> > >> > >> ordering
> >>> > >> > >> >>>> such
> >>> > >> > >> >>>> that
> >>> > >> > >> >>>> you can't determine the MAX_KEY, then you probably don't
> >>> care
> >>> > >> about
> >>> > >> > >> the
> >>> > >> > >> >>>> iterator order anyway.
> >>> > >> > >> >>>>
> >>> > >> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If
> >>> all your
> >>> > >> > >> keys were
> >>> > >> > >> >>>> just
> >>> > >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact
> >>> the only
> >>> > >> way
> >>> > >> > >> to
> >>> > >> > >> >>>> figure out the
> >>> > >> > >> >>>> maximum key in the store is by using `all()` -- and
> >>> without a
> >>> > >> reverse
> >>> > >> > >> >>>> iterator, you're
> >>> > >> > >> >>>> doomed to iterate through every single key just to answer
> >>> that
> >>> > >> simple
> >>> > >> > >> >>>> question.
> >>> > >> > >> >>>>
> >>> > >> > >> >>>> That said, I still think determining the iterator order
> >>> based
> >>> > >> on the
> >>> > >> > >> >>>> to/from bytes
> >>> > >> > >> >>>> makes a lot of intuitive sense and gives the API a nice
> >>> > >> symmetry.
> >>> > >> > >> What if
> >>> > >> > >> >>>> we
> >>> > >> > >> >>>> solved the `all()` problem by just giving `all()` a
> >>> reverse
> >>> > >> form to
> >>> > >> > >> >>>> complement it?
> >>> > >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something
> >>> to
> >>> > >> that
> >>> > >> > >> effect.
> >>> > >> > >> >>>>
> >>> > >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
> >>> Otoya <
> >>> > >> > >> >>>> quilcate.jorge@gmail.com> wrote:
> >>> > >> > >> >>>>
> >>> > >> > >> >>>>> Thanks John.
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>> Agree. I like the first approach as well, with
> >>> StreamsConfig
> >>> > >> flag
> >>> > >> > >> >>>> passing
> >>> > >> > >> >>>>> by via ProcessorContext.
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>> Another positive effect with "reverse parameters" is
> >>> that in
> >>> > >> the
> >>> > >> > >> case of
> >>> > >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can
> >>> decide
> >>> > >> _which_
> >>> > >> > >> pair
> >>> > >> > >> >>>> to
> >>> > >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>> The only issue I've found while reviewing the KIP is that
> >>> > >> `all()`
> >>> > >> > >> won't
> >>> > >> > >> >>>> fit
> >>> > >> > >> >>>>> within this approach.
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>> We could remove it from the KIP and argue that for
> >>> WindowStore,
> >>> > >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in
> >>> reverse
> >>> > >> > >> order,
> >>> > >> > >> >>>> and
> >>> > >> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>> If there is consensus with this changes, I will go and
> >>> update
> >>> > >> the
> >>> > >> > >> KIP.
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
> >>> > >> vvcephei@apache.org>
> >>> > >> > >> >>>> wrote:
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>>> Hi Jorge,
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would
> >>> protect
> >>> > >> anyone
> >>> > >> > >> >>>>>> who may be depending on the current behavior.
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> It seems better to locate the feature flag in the
> >>> > >> initialization
> >>> > >> > >> >>>> logic of
> >>> > >> > >> >>>>>> the store, rather than have a method on the "live"
> >>> store that
> >>> > >> > >> changes
> >>> > >> > >> >>>>>> its behavior on the fly.
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> It seems like there are two options here, one is to add
> >>> a new
> >>> > >> > >> config:
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
> >>> > >> > >> >>>>>>   "enable.backwards.iteration
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> Or we can add a feature flag in Materialized, like
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> Materialized.enableBackwardsIteration()
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> I think I'd personally lean toward the config, for the
> >>> > >> following
> >>> > >> > >> >>>> reason.
> >>> > >> > >> >>>>>> The concern that Sophie raised is that someone's
> >>> program may
> >>> > >> depend
> >>> > >> > >> >>>>>> on the existing contract of getting an empty iterator.
> >>> We
> >>> > >> don't
> >>> > >> > >> want
> >>> > >> > >> >>>> to
> >>> > >> > >> >>>>>> switch behavior when they aren't expecting it, so we
> >>> provide
> >>> > >> them a
> >>> > >> > >> >>>>>> config to assert that they _are_ expecting the new
> >>> behavior,
> >>> > >> which
> >>> > >> > >> >>>>>> means they take responsibility for updating their code
> >>> to
> >>> > >> expect
> >>> > >> > >> the
> >>> > >> > >> >>>> new
> >>> > >> > >> >>>>>> behavior.
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
> >>> > >> behaviors
> >>> > >> > >> on a
> >>> > >> > >> >>>>>> per-query, or per-store basis. We just want people to
> >>> be not
> >>> > >> > >> surprised
> >>> > >> > >> >>>>>> by this change in general.
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> What do you think?
> >>> > >> > >> >>>>>> Thanks,
> >>> > >> > >> >>>>>> -John
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
> >>> > >> > >> >>>>>>> Thank you both for the great feedback.
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the
> >>> need
> >>> > >> for
> >>> > >> > >> >>>>>>> additional API methods. And with a feature flag on
> >>> > >> `StateStore`,
> >>> > >> > >> >>>>>>> disabled by default, should no break current users.
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>> The only side-effect I can think of is that: by moving
> >>> the
> >>> > >> flag
> >>> > >> > >> >>>>> upwards,
> >>> > >> > >> >>>>>>> all later operations become affected; which might be
> >>> ok for
> >>> > >> most
> >>> > >> > >> >>>> (all?)
> >>> > >> > >> >>>>>>> cases. I can't think of an scenario where this would
> >>> be an
> >>> > >> issue,
> >>> > >> > >> >>>> just
> >>> > >> > >> >>>>>>> want to point this out.
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>> If moving to this approach, I'd like to check if I got
> >>> this
> >>> > >> right
> >>> > >> > >> >>>>> before
> >>> > >> > >> >>>>>>> updating the KIP:
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>> - only `StateStore` will change by having a new method:
> >>> > >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep
> >>> things
> >>> > >> > >> compatible.
> >>> > >> > >> >>>>>>> - then all `*Stores` will have to update their
> >>> implementation
> >>> > >> > >> based
> >>> > >> > >> >>>> on
> >>> > >> > >> >>>>>>> this flag.
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> >>> > >> > >> >>>>>>>>> There's no possibility that someone could be relying
> >>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> >>> because
> >>> > >> that's
> >>> > >> > >> >>>> not
> >>> > >> > >> >>>>>> what
> >>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> >>> getting
> >>> > >> an
> >>> > >> > >> >>>> empty
> >>> > >> > >> >>>>>>>> iterator
> >>> > >> > >> >>>>>>>>
> >>> > >> > >> >>>>>>>> I just meant that they might be relying on the
> >>> assumption
> >>> > >> that
> >>> > >> > >> the
> >>> > >> > >> >>>>>> range
> >>> > >> > >> >>>>>>>> query
> >>> > >> > >> >>>>>>>> will never return results with decreasing keys. The
> >>> empty
> >>> > >> > >> iterator
> >>> > >> > >> >>>>>> wouldn't
> >>> > >> > >> >>>>>>>> break that contract, but of course a surprise reverse
> >>> > >> iterator
> >>> > >> > >> >>>> would.
> >>> > >> > >> >>>>>>>>
> >>> > >> > >> >>>>>>>> FWIW I actually am in favor of automatically
> >>> converting to a
> >>> > >> > >> >>>> reverse
> >>> > >> > >> >>>>>>>> iterator,
> >>> > >> > >> >>>>>>>> I just thought we should consider whether this should
> >>> be
> >>> > >> off by
> >>> > >> > >> >>>>>> default or
> >>> > >> > >> >>>>>>>> even possible to disable at all.
> >>> > >> > >> >>>>>>>>
> >>> > >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
> >>> > >> > >> vvcephei@apache.org
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>>> wrote:
> >>> > >> > >> >>>>>>>>
> >>> > >> > >> >>>>>>>>> Thanks for the response, Sophie,
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>>>> I wholeheartedly agree we should take as much into
> >>> account
> >>> > >> as
> >>> > >> > >> >>>>> possible
> >>> > >> > >> >>>>>>>>> up front, rather than regretting our decisions
> >>> later. I
> >>> > >> actually
> >>> > >> > >> >>>> do
> >>> > >> > >> >>>>>> share
> >>> > >> > >> >>>>>>>>> your vague sense of worry, which was what led me to
> >>> say
> >>> > >> > >> initially
> >>> > >> > >> >>>>>> that I
> >>> > >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
> >>> > >> Sometimes, it's
> >>> > >> > >> >>>>>> better
> >>> > >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
> >>> > >> people
> >>> > >> > >> >>>> will be
> >>> > >> > >> >>>>>>>>> confused
> >>> > >> > >> >>>>>>>>> than not.
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>>>> I really don't think that there's any danger of
> >>> "relying
> >>> > >> on a
> >>> > >> > >> >>>> bug"
> >>> > >> > >> >>>>>> here,
> >>> > >> > >> >>>>>>>>> although
> >>> > >> > >> >>>>>>>>> people certainly could be relying on current
> >>> behavior. One
> >>> > >> thing
> >>> > >> > >> >>>> to
> >>> > >> > >> >>>>> be
> >>> > >> > >> >>>>>>>>> clear
> >>> > >> > >> >>>>>>>>> about (which I just left a more detailed comment in
> >>> > >> KAFKA-8159
> >>> > >> > >> >>>>> about)
> >>> > >> > >> >>>>>> is
> >>> > >> > >> >>>>>>>>> that
> >>> > >> > >> >>>>>>>>> when we say something like key1 > key2, this
> >>> ordering is
> >>> > >> defined
> >>> > >> > >> >>>> by
> >>> > >> > >> >>>>>> the
> >>> > >> > >> >>>>>>>>> serde's output and nothing else.
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>>>> Currently, thanks to your fix in
> >>> > >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
> >>> > >> > >> >>>>>>>>> ,
> >>> > >> > >> >>>>>>>>> the store contract is that for range scans, if from
> >>> > to,
> >>> > >> then
> >>> > >> > >> >>>> the
> >>> > >> > >> >>>>>> store
> >>> > >> > >> >>>>>>>>> must
> >>> > >> > >> >>>>>>>>> return an empty iterator. There's no possibility that
> >>> > >> someone
> >>> > >> > >> >>>> could
> >>> > >> > >> >>>>> be
> >>> > >> > >> >>>>>>>>> relying
> >>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> >>> because
> >>> > >> that's
> >>> > >> > >> >>>> not
> >>> > >> > >> >>>>>> what
> >>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> >>> getting
> >>> > >> an
> >>> > >> > >> >>>> empty
> >>> > >> > >> >>>>>>>>> iterator.
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>>>> My counterproposal was to actually change this
> >>> contract to
> >>> > >> say
> >>> > >> > >> >>>> that
> >>> > >> > >> >>>>>> the
> >>> > >> > >> >>>>>>>>> store
> >>> > >> > >> >>>>>>>>> must return an iterator over the keys in that range,
> >>> but
> >>> > >> in the
> >>> > >> > >> >>>>>> reverse
> >>> > >> > >> >>>>>>>>> order.
> >>> > >> > >> >>>>>>>>> So, in addition to considering whether this idea is
> >>> "too
> >>> > >> fancy"
> >>> > >> > >> >>>> (aka
> >>> > >> > >> >>>>>>>>> confusing),
> >>> > >> > >> >>>>>>>>> we should also consider the likelihood of breaking an
> >>> > >> existing
> >>> > >> > >> >>>>>> program with
> >>> > >> > >> >>>>>>>>> this behavior/contract change.
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
> >>> > >> strongly in
> >>> > >> > >> >>>>> favor
> >>> > >> > >> >>>>>> of my
> >>> > >> > >> >>>>>>>>> proposal. I just wanted to present it for
> >>> consideration
> >>> > >> > >> alongside
> >>> > >> > >> >>>>>> Jorge's
> >>> > >> > >> >>>>>>>>> original one.
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>>>> Thanks for raising these very good points,
> >>> > >> > >> >>>>>>>>> -John
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman
> >>> wrote:
> >>> > >> > >> >>>>>>>>>>> Rather than working around it, I think we should
> >>> just
> >>> > >> fix it
> >>> > >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
> >>> > >> > >> >>>>>>>>>>
> >>> > >> > >> >>>>>>>>>> That was my primary concern, although I do have a
> >>> vague
> >>> > >> sense
> >>> > >> > >> of
> >>> > >> > >> >>>>>> worry
> >>> > >> > >> >>>>>>>>>> that we might be allowing users to get into trouble
> >>> > >> without
> >>> > >> > >> >>>>>> realizing it.
> >>> > >> > >> >>>>>>>>>> For example if their custom serdes suffer a similar
> >>> bug
> >>> > >> as the
> >>> > >> > >> >>>>> above,
> >>> > >> > >> >>>>>>>>>> and/or
> >>> > >> > >> >>>>>>>>>> they rely on getting results in increasing order
> >>> (of the
> >>> > >> keys)
> >>> > >> > >> >>>> even
> >>> > >> > >> >>>>>> when
> >>> > >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that
> >>> the
> >>> > >> range
> >>> > >> > >> >>>> query
> >>> > >> > >> >>>>>> returns
> >>> > >> > >> >>>>>>>>>> nothing in that case.
> >>> > >> > >> >>>>>>>>>>
> >>> > >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or
> >>> not,
> >>> > >> but in
> >>> > >> > >> >>>> that
> >>> > >> > >> >>>>>> past
> >>> > >> > >> >>>>>>>>>> we've
> >>> > >> > >> >>>>>>>>>> taken the stance that we should not break
> >>> compatibility
> >>> > >> even if
> >>> > >> > >> >>>> the
> >>> > >> > >> >>>>>> user
> >>> > >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
> >>> > >> > >> >>>>>>>>>>
> >>> > >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against
> >>> this
> >>> > >> > >> >>>> proposal,
> >>> > >> > >> >>>>>> just
> >>> > >> > >> >>>>>>>>>> laying
> >>> > >> > >> >>>>>>>>>> out some considerations we should take into
> >>> account. At
> >>> > >> the end
> >>> > >> > >> >>>> of
> >>> > >> > >> >>>>>> the
> >>> > >> > >> >>>>>>>>> day
> >>> > >> > >> >>>>>>>>>> we should do what's right rather than maintain
> >>> > >> compatibility
> >>> > >> > >> >>>> with
> >>> > >> > >> >>>>>>>>> existing
> >>> > >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle
> >>> ground.
> >>> > >> > >> >>>>>>>>>>
> >>> > >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
> >>> > >> > >> >>>> vvcephei@apache.org>
> >>> > >> > >> >>>>>>>>> wrote:
> >>> > >> > >> >>>>>>>>>>> Thanks Sophie,
> >>> > >> > >> >>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working
> >>> around it,
> >>> > >> I
> >>> > >> > >> >>>> think
> >>> > >> > >> >>>>> we
> >>> > >> > >> >>>>>>>>> should
> >>> > >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
> >>> > >> > >> >>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s
> >>> concern
> >>> > >> that some
> >>> > >> > >> >>>>>> serdes
> >>> > >> > >> >>>>>>>>>>> might be incorrectly written.
> >>> > >> > >> >>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>> Were there other practical concerns that you had
> >>> in mind?
> >>> > >> > >> >>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>> Thanks,
> >>> > >> > >> >>>>>>>>>>> John
> >>> > >> > >> >>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
> >>> > >> wrote:
> >>> > >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from
> >>> bytes
> >>> > >> but I
> >>> > >> > >> >>>>> think
> >>> > >> > >> >>>>>>>>> there
> >>> > >> > >> >>>>>>>>>>>> are some practical limitations to implementing
> >>> this. In
> >>> > >> > >> >>>>> particular
> >>> > >> > >> >>>>>>>>>>>> I'm thinking about this issue
> >>> > >> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159
> >>> >
> >>> > >> with the
> >>> > >> > >> >>>>>> built-in
> >>> > >> > >> >>>>>>>>>>> signed
> >>> > >> > >> >>>>>>>>>>>> number serdes.
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
> >>> > >> > >> >>>> negative-negative
> >>> > >> > >> >>>>>>>>> queries
> >>> > >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
> >>> > >> > >> undetectable
> >>> > >> > >> >>>>>>>>>>>> incorrect results for negative-positive queries.
> >>> For
> >>> > >> example,
> >>> > >> > >> >>>> say
> >>> > >> > >> >>>>>> you
> >>> > >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the
> >>> Short
> >>> > >> > >> serdes.
> >>> > >> > >> >>>>> The
> >>> > >> > >> >>>>>>>>>>>> serialized bytes for that are
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> from = 1111111111111111
> >>> > >> > >> >>>>>>>>>>>> to = 0000000000000001
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> so we would end up flipping those and iterating
> >>> over
> >>> > >> all keys
> >>> > >> > >> >>>>> from
> >>> > >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
> >>> > >> > >> >>>>> lexicographical
> >>> > >> > >> >>>>>>>>>>>> order means we would iterate over every key in
> >>> the space
> >>> > >> > >> >>>> *except*
> >>> > >> > >> >>>>>> for
> >>> > >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we
> >>> meant to be
> >>> > >> > >> >>>> included
> >>> > >> > >> >>>>>> in
> >>> > >> > >> >>>>>>>>> the
> >>> > >> > >> >>>>>>>>>>>> range query.
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> Currently we just log a warning and return an
> >>> empty
> >>> > >> iterator
> >>> > >> > >> >>>> when
> >>> > >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but
> >>> feels
> >>> > >> > >> >>>> slightly
> >>> > >> > >> >>>>>> more
> >>> > >> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
> >>> > >> reverse
> >>> > >> > >> >>>>> queries
> >>> > >> > >> >>>>>> we
> >>> > >> > >> >>>>>>>>>>>> can't even log a warning in this case unless we
> >>> wanted
> >>> > >> to log
> >>> > >> > >> >>>> a
> >>> > >> > >> >>>>>>>>> warning
> >>> > >> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
> >>> > >> usage of a
> >>> > >> > >> >>>> new
> >>> > >> > >> >>>>>>>>>>>> feature.
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off
> >>> the
> >>> > >> top of
> >>> > >> > >> >>>> my
> >>> > >> > >> >>>>>> head
> >>> > >> > >> >>>>>>>>> I
> >>> > >> > >> >>>>>>>>>>> guess
> >>> > >> > >> >>>>>>>>>>>> we could add a store config to enable/disable
> >>> automatic
> >>> > >> > >> >>>> reverse
> >>> > >> > >> >>>>>>>>>>> iteration,
> >>> > >> > >> >>>>>>>>>>>> which is off by default?
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> Sophie
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
> >>> > >> > >> >>>>> vvcephei@apache.org>
> >>> > >> > >> >>>>>>>>>>> wrote:
> >>> > >> > >> >>>>>>>>>>>>> Hi there Jorge,
> >>> > >> > >> >>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>> Thanks for the KIP!
> >>> > >> > >> >>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
> >>> > >> > >> >>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but
> >>> what do
> >>> > >> you
> >>> > >> > >> >>>> think
> >>> > >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing
> >>> people to
> >>> > >> flip
> >>> > >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading
> >>> from "A"
> >>> > >> to "Z"
> >>> > >> > >> >>>>> would
> >>> > >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
> >>> > >> backward
> >>> > >> > >> >>>> one?
> >>> > >> > >> >>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>> Thanks,
> >>> > >> > >> >>>>>>>>>>>>> -John
> >>> > >> > >> >>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate
> >>> wrote:
> >>> > >> > >> >>>>>>>>>>>>>> Hi everyone,
> >>> > >> > >> >>>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>>> I would like to start the discussion for
> >>> KIP-617:
> >>> > >> > >> >>>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>
> >>> > >> > >>
> >>> > >>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> >>> > >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
> >>> > >> > >> >>>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>>> Thanks!
> >>> > >> > >> >>>>>>>>>>>>>> Jorge.
> >>> > >> > >> >>>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>>>
> >>> > >> > >> >>>>>>>>>>>>>> Attachments:
> >>> > >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>>
> >>> > >> > >> >>>>>>> Attachments:
> >>> > >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
> >>> > >> > >> >>>>>>
> >>> > >> > >> >>>>>
> >>> > >> > >> >>>>
> >>> > >> > >> >>>
> >>> > >> > >> >>
> >>> > >> > >>
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> > >
> >>> >
> >>>
> >>
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by Jorge Esteban Quilcate Otoya <qu...@gmail.com>.
(sorry for the spam)

Actually `findSessions` are only deprecated on `InMemorySessionStore`,
which seems strange as RocksDB and interfaces haven't marked these methods
as deprecated.

Any hint on how to handle this?

Cheers,

On Thu, Jul 2, 2020 at 4:57 PM Jorge Esteban Quilcate Otoya <
quilcate.jorge@gmail.com> wrote:

> @John: I can see there are some deprecations in there as well. Will update
> the KIP.
>
> Thanks,
> Jorge
>
>
> On Thu, Jul 2, 2020 at 3:29 PM Jorge Esteban Quilcate Otoya <
> quilcate.jorge@gmail.com> wrote:
>
>> Thanks John.
>>
>> > it looks like there’s a revision error in which two methods are
>> proposed for SessionStore, but seem like they should be in
>> ReadOnlySessionStore. Do I read that right?
>>
>> Yes, I've opted to keep the new methods alongside the existing ones. In
>> the case of SessionStore, `findSessions` are in `SessionStore`, and `fetch`
>> are in `ReadOnlySessionStore`. If it makes more sense, I can move all of
>> them to ReadOnlySessionStore.
>> Let me know what you think.
>>
>> Thanks,
>> Jorge.
>>
>> On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org> wrote:
>>
>>> Hi Jorge,
>>>
>>> Thanks for the update. I think this is a good plan.
>>>
>>> I just took a look at the KIP again, and it looks like there’s a
>>> revision error in which two methods are proposed for SessionStore, but seem
>>> like they should be in ReadOnlySessionStore. Do I read that right?
>>>
>>> Otherwise, I’m happy with your proposal.
>>>
>>> Thanks,
>>> John
>>>
>>> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
>>> > Quick update: KIP is updated with latest changes now.
>>> > Will leave it open this week while working on the PR.
>>> >
>>> > Hope to open a new vote thread over the next few days if no additional
>>> > feedback is provided.
>>> >
>>> > Cheers,
>>> > Jorge.
>>> >
>>> > On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
>>> > quilcate.jorge@gmail.com> wrote:
>>> >
>>> > > Thanks, John!
>>> > >
>>> > > Make sense to reconsider the current approach. I was heading in a
>>> similar
>>> > > direction while drafting the implementation. Metered, Caching, and
>>> other
>>> > > layers will also have to get duplicated to build up new methods in
>>> `Stores`
>>> > > factory, and class casting issues would appear on stores created by
>>> DSL.
>>> > >
>>> > > I will draft a proposal with new methods (move methods from proposed
>>> > > interfaces to existing ones) with default implementation in a KIP
>>> update
>>> > > and wait for Matthias to chime in to validate this approach.
>>> > >
>>> > > Jorge.
>>> > >
>>> > >
>>> > > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <vv...@apache.org>
>>> wrote:
>>> > >
>>> > >> Hi Jorge,
>>> > >>
>>> > >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
>>> releases.
>>> > >>
>>> > >> The idea to separate the new methods into "mixin" interfaces seems
>>> > >> like a good one, but as we've discovered in KIP-614, it doesn't work
>>> > >> out that way in practice. The problem is that the store
>>> implementations
>>> > >> are just the base layer that get composed with other layers in
>>> Streams
>>> > >> before they can be accessed in the DSL. This is extremely subtle, so
>>> > >> I'm going to put everyone to sleep with a detailed explanation:
>>> > >>
>>> > >> For example, this is the mechanism by which all KeyValueStore
>>> > >> implementations get added to Streams:
>>> > >> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
>>> > >> return new MeteredKeyValueStore<>(
>>> > >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
>>> > >>   storeSupplier.metricsScope(),
>>> > >>   time,
>>> > >>   keySerde,
>>> > >>   valueSerde
>>> > >> );
>>> > >>
>>> > >> In the DSL, the store that a processor gets from the context would
>>> be
>>> > >> the result of this composition. So even if the storeSupplier.get()
>>> returns
>>> > >> a store that implements the "reverse" interface, when you try to
>>> use it
>>> > >> from a processor like:
>>> > >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
>>> > >> ReadOnlyBackwardWindowStore<K, V> store =
>>> > >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
>>> > >>
>>> > >> You'd just get a ClassCastException because it's actually a
>>> > >> MeteredKeyValueStore, which doesn't implement
>>> > >> ReadOnlyBackwardWindowStore.
>>> > >>
>>> > >> The only way to make this work would be to make the Metered,
>>> > >> Caching, and Logging layers also implement the new interfaces,
>>> > >> but this effectively forces implementations to also implement
>>> > >> the interface. Otherwise, the intermediate layers would have to
>>> > >> cast the store in each method, like this:
>>> > >> MeteredWindowStore#backwardFetch {
>>> > >>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
>>> > >> }
>>> > >>
>>> > >> And then if the implementation doesn't "opt in" by implementing
>>> > >> the interface, you'd get a ClassCastException, not when you get the
>>> > >> store, but when you try to use it.
>>> > >>
>>> > >> The fact that we get ClassCastExceptions no matter which way we
>>> > >> turn here indicates that we're really not getting any benefit from
>>> the
>>> > >> type system, which makes the extra interfaces seem not worth all the
>>> > >> code involved.
>>> > >>
>>> > >> Where we landed in KIP-614 is that, unless we want to completely
>>> > >> revamp the way that StateStores work in the DSL, you might as
>>> > >> well just add the new methods to the existing interfaces. To prevent
>>> > >> compilation errors, we can add default implementations that throw
>>> > >> UnsupportedOperationException. If a store doesn't opt in by
>>> > >> implementing the methods, you'd get an
>>> UnsupportedOperationException,
>>> > >> which seems no worse, and maybe better, than the ClassCastException
>>> > >> you'd get if we go with the "mixin interface" approach.
>>> > >>
>>> > >> A quick note: This entire discussion focuses on the DSL. If you're
>>> just
>>> > >> using the Processor API by directly adding the a custom store to the
>>> > >> Topology:
>>> > >> org.apache.kafka.streams.Topology#addStateStore
>>> > >> and then retrieving it in the processor via:
>>> > >> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
>>> > >> in
>>> > >> org.apache.kafka.streams.processor.Processor#init
>>> > >>
>>> > >> Then, you can both register and retrieve _any_ StateStore
>>> implementation.
>>> > >> There's no need to use KeyValueStore or any other built-in
>>> interface.
>>> > >> In other words, KeyValueStore and company are only part of the DSL,
>>> > >> not the PAPI. So, discussions about the build-in store interfaces
>>> are only
>>> > >> really relevant in the context of the DSL, Transformers, and
>>> Materialized.
>>> > >>
>>> > >> So, in conclusion, I'd really recommend just adding any new methods
>>> to
>>> > >> the existing store interfaces. We might be able to revamp the API
>>> in the
>>> > >> future to support mixins, but it's a much larger scope of work than
>>> this
>>> > >> KIP.
>>> > >> A more minor comment is that we don't need to add Deprecated
>>> variants
>>> > >> of new methods.
>>> > >>
>>> > >> Thanks again, and once again, I'm sorry I tuned out and didn't
>>> offer this
>>> > >> feedback before you revised the KIP.
>>> > >> -John
>>> > >>
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
>>> > >> > Hi everyone,
>>> > >> >
>>> > >> > I've updated the KIP, applying Matthias' feedback regarding
>>> interface
>>> > >> > hierarchy.
>>> > >> >
>>> > >> > Also, following the last email, I think we can consider reverse
>>> > >> operations
>>> > >> > on KeyValue range as well, as implementation supports
>>> lexicographic
>>> > >> order.
>>> > >> >
>>> > >> > I considered different naming between Key-based ranges and
>>> Time-based
>>> > >> > ranges, and mitigate confusion when fetching keys and time ranges
>>> as
>>> > >> > WindowStore does:
>>> > >> >
>>> > >> > Key-based ranges: reverseRange(), reverseAll()
>>> > >> > Time-based ranges: backwardFetch()
>>> > >> >
>>> > >> > Then, key-based changes apply to KeyValueStore, and time-based
>>> changes
>>> > >> to
>>> > >> > Window and Session stores.
>>> > >> >
>>> > >> > Let me know if you have any questions.
>>> > >> >
>>> > >> > Thanks,
>>> > >> > Jorge.
>>> > >> >
>>> > >> >
>>> > >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
>>> > >> > quilcate.jorge@gmail.com> wrote:
>>> > >> >
>>> > >> > > Hi everyone, sorry for the late reply.
>>> > >> > >
>>> > >> > > Thanks Matthias for your feedback. I think it makes sense to
>>> > >> reconsider
>>> > >> > > the current design based on your input.
>>> > >> > >
>>> > >> > > After digging deeper into the current implementation, I'd like
>>> to
>>> > >> bring my
>>> > >> > > current understanding to be double-checked as it might be
>>> redefining
>>> > >> the
>>> > >> > > KIP's scope:
>>> > >> > >
>>> > >> > > 1. There are 2 ranges been exposed by different stores:
>>> > >> > >
>>> > >> > > a. Key Range
>>> > >> > > b. Timestamp Range
>>> > >> > >
>>> > >> > > So far, we have discussed covering both.
>>> > >> > >
>>> > >> > > 2. Key Range functions do not provide ordering guarantees by
>>> design:
>>> > >> > >
>>> > >> > > ```ReadOnlyKeyValueStore.java
>>> > >> > >     /**
>>> > >> > >      * Get an iterator over a given range of keys. This
>>> iterator must
>>> > >> be
>>> > >> > > closed after use.
>>> > >> > >      * The returned iterator must be safe from {@link
>>> > >> > > java.util.ConcurrentModificationException}s
>>> > >> > >      * and must not return null values. No ordering guarantees
>>> are
>>> > >> > > provided.
>>> > >> > >      * ...
>>> > >> > >      */
>>> > >> > >      KeyValueIterator<K, V> range(K from, K to);
>>> > >> > > ```
>>> > >> > >
>>> > >> > > Therefore, I'd propose removing Key range operations from the
>>> scope.
>>> > >> > >
>>> > >> > > 3. Timestamp Range operations happen at the SegmentsStore level
>>> > >> (internal)
>>> > >> > > API
>>> > >> > >
>>> > >> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
>>> > >> > >
>>> > >> > > I'd propose extending `Segments#segments(timeFrom, timeTo,
>>> backwards)`
>>> > >> > > with a flag for backwards operations.
>>> > >> > >
>>> > >> > > As segments returned will be processed backwards, I'm not
>>> extending
>>> > >> > > KeyValueStores to query each segment backwards as previous
>>> point 2.
>>> > >> > >
>>> > >> > > 4. Extend WindowStores implementations with a new
>>> > >> > > WindowBackwardStore/ReadOnlyBackwardStore:
>>> > >> > >
>>> > >> > > ```java
>>> > >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
>>> > >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from,
>>> Instant
>>> > >> to)
>>> > >> > > throws IllegalArgumentException;
>>> > >> > >
>>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
>>> > >> Instant
>>> > >> > > fromTime, Instant toTime)
>>> > >> > >         throws IllegalArgumentException;
>>> > >> > >
>>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant
>>> from,
>>> > >> > > Instant to) throws IllegalArgumentException;
>>> > >> > > ```
>>> > >> > >
>>> > >> > > 5. SessionStore is a bit different as it has fetch/find sessions
>>> > >> spread
>>> > >> > > between SessionStore and ReadOnlySessionStore.
>>> > >> > >
>>> > >> > > I'd propose a new interface `SessionBackwardStore` to expose
>>> backward
>>> > >> find
>>> > >> > > operations:
>>> > >> > >
>>> > >> > > ```java
>>> > >> > > public interface SessionBackwardStore<K, AGG> {
>>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
>>> backwardFindSessions(final K
>>> > >> key,
>>> > >> > > final long earliestSessionEndTime, final long
>>> latestSessionStartTime);
>>> > >> > >
>>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
>>> backwardFindSessions(final K
>>> > >> > > keyFrom, final K keyTo, final long earliestSessionEndTime,
>>> final long
>>> > >> > > latestSessionStartTime);
>>> > >> > > }
>>> > >> > > ```
>>> > >> > >
>>> > >> > > If this understanding is correct I'll proceed to update the KIP
>>> based
>>> > >> on
>>> > >> > > this.
>>> > >> > >
>>> > >> > > Looking forward to your feedback.
>>> > >> > >
>>> > >> > > Thanks,
>>> > >> > > Jorge.
>>> > >> > >
>>> > >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <
>>> mjsax@apache.org>
>>> > >> wrote:
>>> > >> > >
>>> > >> > >> Hey,
>>> > >> > >>
>>> > >> > >> Sorry that I am late to the game. I am not 100% convinced
>>> about the
>>> > >> > >> current proposal. Using a new config as feature flag seems to
>>> be
>>> > >> rather
>>> > >> > >> "nasty" to me, and flipping from/to is a little bit too fancy
>>> for my
>>> > >> > >> personal taste.
>>> > >> > >>
>>> > >> > >> I agree, that the original proposal using a "ReadDirection"
>>> enum is
>>> > >> not
>>> > >> > >> ideal either.
>>> > >> > >>
>>> > >> > >> Thus, I would like to put out a new idea: We could add a new
>>> > >> interface
>>> > >> > >> that offers new methods that return revers iterators.
>>> > >> > >>
>>> > >> > >> The KIP already proposes to add `reverseAll()` and it seems
>>> backward
>>> > >> > >> incompatible to just add this method to
>>> `ReadOnlyKeyValueStore` and
>>> > >> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
>>> > >> default
>>> > >> > >> implementation for custom stores and thus either break
>>> compatibility
>>> > >> or
>>> > >> > >> need add a default that just throws an exception. Neither
>>> seems to
>>> > >> be a
>>> > >> > >> good option.
>>> > >> > >>
>>> > >> > >> Using a new interface avoid this issue and allows users
>>> implementing
>>> > >> > >> custom stores to opt-in by adding the interface to their
>>> stores.
>>> > >> > >> Furthermore, we don't need any config. In the end, we
>>> encapsulte the
>>> > >> > >> change into the store, and our runtime is agnostic to it (as it
>>> > >> should
>>> > >> > >> be).
>>> > >> > >>
>>> > >> > >> The hierarchy becomes a little complex (but uses would not
>>> really see
>>> > >> > >> the complexity):
>>> > >> > >>
>>> > >> > >> // exsiting
>>> > >> > >> ReadOnlyKeyValueStore
>>> > >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> // helper interface; users don't care
>>> > >> > >> // need similar ones for other stores
>>> > >> > >> ReverseReadOnlyKeyValueStore {
>>> > >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
>>> > >> > >>     KeyValueIterator<K, V> reverseAll();
>>> > >> > >> }
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> // two new user facing interfaces for kv-store
>>> > >> > >> // need similar ones for other stores
>>> > >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
>>> > >> ReadOnlyKeyValueStore,
>>> > >> > >> ReverseReadOnlyKeyValueStore
>>> > >> > >>
>>> > >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
>>> > >> > >> ReverseReadOnlyKeyValueStore
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> // updated (also internal)
>>> > >> > >> // also need to update other built-in stores
>>> > >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
>>> > >> BulkLoadingStore
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> In the end, user would only care about the two (for kv-case)
>>> new
>>> > >> > >> interface that offer revers iterator (read only and regular)
>>> and can
>>> > >> > >> cast stores accordingly in their Processors/Transformers or
>>> via IQ.
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> Btw: if we add revers iterator for KeyValue and Window store,
>>> should
>>> > >> we
>>> > >> > >> do the same for Session store?
>>> > >> > >>
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> This might be more code to write, but I believe it provides the
>>> > >> better
>>> > >> > >> user experience. Thoughts?
>>> > >> > >>
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> -Matthias
>>> > >> > >>
>>> > >> > >>
>>> > >> > >>
>>> > >> > >>
>>> > >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
>>> > >> > >> > Sorry for my silence, Jorge,
>>> > >> > >> >
>>> > >> > >> > I've just taken another look, and I'm personally happy with
>>> the
>>> > >> KIP.
>>> > >> > >> >
>>> > >> > >> > Thanks,
>>> > >> > >> > -John
>>> > >> > >> >
>>> > >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya
>>> wrote:
>>> > >> > >> >> If no additional comments, I will proceed to start the a
>>> vote
>>> > >> thread.
>>> > >> > >> >>
>>> > >> > >> >> Thanks a lot for your feedback!
>>> > >> > >> >>
>>> > >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate
>>> Otoya <
>>> > >> > >> >> quilcate.jorge@gmail.com> wrote:
>>> > >> > >> >>
>>> > >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
>>> > >> > >> >>>
>>> > >> > >> >>> I updated the KIP with your feedback.
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>>
>>> > >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
>>> > >> > >> sophie@confluent.io>
>>> > >> > >> >>> wrote:
>>> > >> > >> >>>
>>> > >> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in
>>> the
>>> > >> case of
>>> > >> > >> >>>> variable-length keys.
>>> > >> > >> >>>>
>>> > >> > >> >>>> In the case of fixed-length keys, you can just compute
>>> the keys
>>> > >> that
>>> > >> > >> >>>> correspond
>>> > >> > >> >>>> to the maximum and minimum serialized bytes, then perform
>>> a
>>> > >> `range()`
>>> > >> > >> >>>> query
>>> > >> > >> >>>> instead of an `all()`. If your keys don't have a
>>> well-defined
>>> > >> > >> ordering
>>> > >> > >> >>>> such
>>> > >> > >> >>>> that
>>> > >> > >> >>>> you can't determine the MAX_KEY, then you probably don't
>>> care
>>> > >> about
>>> > >> > >> the
>>> > >> > >> >>>> iterator order anyway.
>>> > >> > >> >>>>
>>> > >> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If
>>> all your
>>> > >> > >> keys were
>>> > >> > >> >>>> just
>>> > >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact
>>> the only
>>> > >> way
>>> > >> > >> to
>>> > >> > >> >>>> figure out the
>>> > >> > >> >>>> maximum key in the store is by using `all()` -- and
>>> without a
>>> > >> reverse
>>> > >> > >> >>>> iterator, you're
>>> > >> > >> >>>> doomed to iterate through every single key just to answer
>>> that
>>> > >> simple
>>> > >> > >> >>>> question.
>>> > >> > >> >>>>
>>> > >> > >> >>>> That said, I still think determining the iterator order
>>> based
>>> > >> on the
>>> > >> > >> >>>> to/from bytes
>>> > >> > >> >>>> makes a lot of intuitive sense and gives the API a nice
>>> > >> symmetry.
>>> > >> > >> What if
>>> > >> > >> >>>> we
>>> > >> > >> >>>> solved the `all()` problem by just giving `all()` a
>>> reverse
>>> > >> form to
>>> > >> > >> >>>> complement it?
>>> > >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something
>>> to
>>> > >> that
>>> > >> > >> effect.
>>> > >> > >> >>>>
>>> > >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
>>> Otoya <
>>> > >> > >> >>>> quilcate.jorge@gmail.com> wrote:
>>> > >> > >> >>>>
>>> > >> > >> >>>>> Thanks John.
>>> > >> > >> >>>>>
>>> > >> > >> >>>>> Agree. I like the first approach as well, with
>>> StreamsConfig
>>> > >> flag
>>> > >> > >> >>>> passing
>>> > >> > >> >>>>> by via ProcessorContext.
>>> > >> > >> >>>>>
>>> > >> > >> >>>>> Another positive effect with "reverse parameters" is
>>> that in
>>> > >> the
>>> > >> > >> case of
>>> > >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can
>>> decide
>>> > >> _which_
>>> > >> > >> pair
>>> > >> > >> >>>> to
>>> > >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
>>> > >> > >> >>>>>
>>> > >> > >> >>>>> The only issue I've found while reviewing the KIP is that
>>> > >> `all()`
>>> > >> > >> won't
>>> > >> > >> >>>> fit
>>> > >> > >> >>>>> within this approach.
>>> > >> > >> >>>>>
>>> > >> > >> >>>>> We could remove it from the KIP and argue that for
>>> WindowStore,
>>> > >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in
>>> reverse
>>> > >> > >> order,
>>> > >> > >> >>>> and
>>> > >> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
>>> > >> > >> >>>>>
>>> > >> > >> >>>>> If there is consensus with this changes, I will go and
>>> update
>>> > >> the
>>> > >> > >> KIP.
>>> > >> > >> >>>>>
>>> > >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
>>> > >> vvcephei@apache.org>
>>> > >> > >> >>>> wrote:
>>> > >> > >> >>>>>
>>> > >> > >> >>>>>> Hi Jorge,
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would
>>> protect
>>> > >> anyone
>>> > >> > >> >>>>>> who may be depending on the current behavior.
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> It seems better to locate the feature flag in the
>>> > >> initialization
>>> > >> > >> >>>> logic of
>>> > >> > >> >>>>>> the store, rather than have a method on the "live"
>>> store that
>>> > >> > >> changes
>>> > >> > >> >>>>>> its behavior on the fly.
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> It seems like there are two options here, one is to add
>>> a new
>>> > >> > >> config:
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
>>> > >> > >> >>>>>>   "enable.backwards.iteration
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> Or we can add a feature flag in Materialized, like
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> Materialized.enableBackwardsIteration()
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> I think I'd personally lean toward the config, for the
>>> > >> following
>>> > >> > >> >>>> reason.
>>> > >> > >> >>>>>> The concern that Sophie raised is that someone's
>>> program may
>>> > >> depend
>>> > >> > >> >>>>>> on the existing contract of getting an empty iterator.
>>> We
>>> > >> don't
>>> > >> > >> want
>>> > >> > >> >>>> to
>>> > >> > >> >>>>>> switch behavior when they aren't expecting it, so we
>>> provide
>>> > >> them a
>>> > >> > >> >>>>>> config to assert that they _are_ expecting the new
>>> behavior,
>>> > >> which
>>> > >> > >> >>>>>> means they take responsibility for updating their code
>>> to
>>> > >> expect
>>> > >> > >> the
>>> > >> > >> >>>> new
>>> > >> > >> >>>>>> behavior.
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
>>> > >> behaviors
>>> > >> > >> on a
>>> > >> > >> >>>>>> per-query, or per-store basis. We just want people to
>>> be not
>>> > >> > >> surprised
>>> > >> > >> >>>>>> by this change in general.
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> What do you think?
>>> > >> > >> >>>>>> Thanks,
>>> > >> > >> >>>>>> -John
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
>>> > >> > >> >>>>>>> Thank you both for the great feedback.
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the
>>> need
>>> > >> for
>>> > >> > >> >>>>>>> additional API methods. And with a feature flag on
>>> > >> `StateStore`,
>>> > >> > >> >>>>>>> disabled by default, should no break current users.
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>> The only side-effect I can think of is that: by moving
>>> the
>>> > >> flag
>>> > >> > >> >>>>> upwards,
>>> > >> > >> >>>>>>> all later operations become affected; which might be
>>> ok for
>>> > >> most
>>> > >> > >> >>>> (all?)
>>> > >> > >> >>>>>>> cases. I can't think of an scenario where this would
>>> be an
>>> > >> issue,
>>> > >> > >> >>>> just
>>> > >> > >> >>>>>>> want to point this out.
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>> If moving to this approach, I'd like to check if I got
>>> this
>>> > >> right
>>> > >> > >> >>>>> before
>>> > >> > >> >>>>>>> updating the KIP:
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>> - only `StateStore` will change by having a new method:
>>> > >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep
>>> things
>>> > >> > >> compatible.
>>> > >> > >> >>>>>>> - then all `*Stores` will have to update their
>>> implementation
>>> > >> > >> based
>>> > >> > >> >>>> on
>>> > >> > >> >>>>>>> this flag.
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>>> > >> > >> >>>>>>>>> There's no possibility that someone could be relying
>>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
>>> because
>>> > >> that's
>>> > >> > >> >>>> not
>>> > >> > >> >>>>>> what
>>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
>>> getting
>>> > >> an
>>> > >> > >> >>>> empty
>>> > >> > >> >>>>>>>> iterator
>>> > >> > >> >>>>>>>>
>>> > >> > >> >>>>>>>> I just meant that they might be relying on the
>>> assumption
>>> > >> that
>>> > >> > >> the
>>> > >> > >> >>>>>> range
>>> > >> > >> >>>>>>>> query
>>> > >> > >> >>>>>>>> will never return results with decreasing keys. The
>>> empty
>>> > >> > >> iterator
>>> > >> > >> >>>>>> wouldn't
>>> > >> > >> >>>>>>>> break that contract, but of course a surprise reverse
>>> > >> iterator
>>> > >> > >> >>>> would.
>>> > >> > >> >>>>>>>>
>>> > >> > >> >>>>>>>> FWIW I actually am in favor of automatically
>>> converting to a
>>> > >> > >> >>>> reverse
>>> > >> > >> >>>>>>>> iterator,
>>> > >> > >> >>>>>>>> I just thought we should consider whether this should
>>> be
>>> > >> off by
>>> > >> > >> >>>>>> default or
>>> > >> > >> >>>>>>>> even possible to disable at all.
>>> > >> > >> >>>>>>>>
>>> > >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
>>> > >> > >> vvcephei@apache.org
>>> > >> > >> >>>>>
>>> > >> > >> >>>>>> wrote:
>>> > >> > >> >>>>>>>>
>>> > >> > >> >>>>>>>>> Thanks for the response, Sophie,
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>>>> I wholeheartedly agree we should take as much into
>>> account
>>> > >> as
>>> > >> > >> >>>>> possible
>>> > >> > >> >>>>>>>>> up front, rather than regretting our decisions
>>> later. I
>>> > >> actually
>>> > >> > >> >>>> do
>>> > >> > >> >>>>>> share
>>> > >> > >> >>>>>>>>> your vague sense of worry, which was what led me to
>>> say
>>> > >> > >> initially
>>> > >> > >> >>>>>> that I
>>> > >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
>>> > >> Sometimes, it's
>>> > >> > >> >>>>>> better
>>> > >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
>>> > >> people
>>> > >> > >> >>>> will be
>>> > >> > >> >>>>>>>>> confused
>>> > >> > >> >>>>>>>>> than not.
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>>>> I really don't think that there's any danger of
>>> "relying
>>> > >> on a
>>> > >> > >> >>>> bug"
>>> > >> > >> >>>>>> here,
>>> > >> > >> >>>>>>>>> although
>>> > >> > >> >>>>>>>>> people certainly could be relying on current
>>> behavior. One
>>> > >> thing
>>> > >> > >> >>>> to
>>> > >> > >> >>>>> be
>>> > >> > >> >>>>>>>>> clear
>>> > >> > >> >>>>>>>>> about (which I just left a more detailed comment in
>>> > >> KAFKA-8159
>>> > >> > >> >>>>> about)
>>> > >> > >> >>>>>> is
>>> > >> > >> >>>>>>>>> that
>>> > >> > >> >>>>>>>>> when we say something like key1 > key2, this
>>> ordering is
>>> > >> defined
>>> > >> > >> >>>> by
>>> > >> > >> >>>>>> the
>>> > >> > >> >>>>>>>>> serde's output and nothing else.
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>>>> Currently, thanks to your fix in
>>> > >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
>>> > >> > >> >>>>>>>>> ,
>>> > >> > >> >>>>>>>>> the store contract is that for range scans, if from
>>> > to,
>>> > >> then
>>> > >> > >> >>>> the
>>> > >> > >> >>>>>> store
>>> > >> > >> >>>>>>>>> must
>>> > >> > >> >>>>>>>>> return an empty iterator. There's no possibility that
>>> > >> someone
>>> > >> > >> >>>> could
>>> > >> > >> >>>>> be
>>> > >> > >> >>>>>>>>> relying
>>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
>>> because
>>> > >> that's
>>> > >> > >> >>>> not
>>> > >> > >> >>>>>> what
>>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
>>> getting
>>> > >> an
>>> > >> > >> >>>> empty
>>> > >> > >> >>>>>>>>> iterator.
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>>>> My counterproposal was to actually change this
>>> contract to
>>> > >> say
>>> > >> > >> >>>> that
>>> > >> > >> >>>>>> the
>>> > >> > >> >>>>>>>>> store
>>> > >> > >> >>>>>>>>> must return an iterator over the keys in that range,
>>> but
>>> > >> in the
>>> > >> > >> >>>>>> reverse
>>> > >> > >> >>>>>>>>> order.
>>> > >> > >> >>>>>>>>> So, in addition to considering whether this idea is
>>> "too
>>> > >> fancy"
>>> > >> > >> >>>> (aka
>>> > >> > >> >>>>>>>>> confusing),
>>> > >> > >> >>>>>>>>> we should also consider the likelihood of breaking an
>>> > >> existing
>>> > >> > >> >>>>>> program with
>>> > >> > >> >>>>>>>>> this behavior/contract change.
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
>>> > >> strongly in
>>> > >> > >> >>>>> favor
>>> > >> > >> >>>>>> of my
>>> > >> > >> >>>>>>>>> proposal. I just wanted to present it for
>>> consideration
>>> > >> > >> alongside
>>> > >> > >> >>>>>> Jorge's
>>> > >> > >> >>>>>>>>> original one.
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>>>> Thanks for raising these very good points,
>>> > >> > >> >>>>>>>>> -John
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman
>>> wrote:
>>> > >> > >> >>>>>>>>>>> Rather than working around it, I think we should
>>> just
>>> > >> fix it
>>> > >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
>>> > >> > >> >>>>>>>>>>
>>> > >> > >> >>>>>>>>>> That was my primary concern, although I do have a
>>> vague
>>> > >> sense
>>> > >> > >> of
>>> > >> > >> >>>>>> worry
>>> > >> > >> >>>>>>>>>> that we might be allowing users to get into trouble
>>> > >> without
>>> > >> > >> >>>>>> realizing it.
>>> > >> > >> >>>>>>>>>> For example if their custom serdes suffer a similar
>>> bug
>>> > >> as the
>>> > >> > >> >>>>> above,
>>> > >> > >> >>>>>>>>>> and/or
>>> > >> > >> >>>>>>>>>> they rely on getting results in increasing order
>>> (of the
>>> > >> keys)
>>> > >> > >> >>>> even
>>> > >> > >> >>>>>> when
>>> > >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that
>>> the
>>> > >> range
>>> > >> > >> >>>> query
>>> > >> > >> >>>>>> returns
>>> > >> > >> >>>>>>>>>> nothing in that case.
>>> > >> > >> >>>>>>>>>>
>>> > >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or
>>> not,
>>> > >> but in
>>> > >> > >> >>>> that
>>> > >> > >> >>>>>> past
>>> > >> > >> >>>>>>>>>> we've
>>> > >> > >> >>>>>>>>>> taken the stance that we should not break
>>> compatibility
>>> > >> even if
>>> > >> > >> >>>> the
>>> > >> > >> >>>>>> user
>>> > >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
>>> > >> > >> >>>>>>>>>>
>>> > >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against
>>> this
>>> > >> > >> >>>> proposal,
>>> > >> > >> >>>>>> just
>>> > >> > >> >>>>>>>>>> laying
>>> > >> > >> >>>>>>>>>> out some considerations we should take into
>>> account. At
>>> > >> the end
>>> > >> > >> >>>> of
>>> > >> > >> >>>>>> the
>>> > >> > >> >>>>>>>>> day
>>> > >> > >> >>>>>>>>>> we should do what's right rather than maintain
>>> > >> compatibility
>>> > >> > >> >>>> with
>>> > >> > >> >>>>>>>>> existing
>>> > >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle
>>> ground.
>>> > >> > >> >>>>>>>>>>
>>> > >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
>>> > >> > >> >>>> vvcephei@apache.org>
>>> > >> > >> >>>>>>>>> wrote:
>>> > >> > >> >>>>>>>>>>> Thanks Sophie,
>>> > >> > >> >>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working
>>> around it,
>>> > >> I
>>> > >> > >> >>>> think
>>> > >> > >> >>>>> we
>>> > >> > >> >>>>>>>>> should
>>> > >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
>>> > >> > >> >>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s
>>> concern
>>> > >> that some
>>> > >> > >> >>>>>> serdes
>>> > >> > >> >>>>>>>>>>> might be incorrectly written.
>>> > >> > >> >>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>> Were there other practical concerns that you had
>>> in mind?
>>> > >> > >> >>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>> Thanks,
>>> > >> > >> >>>>>>>>>>> John
>>> > >> > >> >>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
>>> > >> wrote:
>>> > >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from
>>> bytes
>>> > >> but I
>>> > >> > >> >>>>> think
>>> > >> > >> >>>>>>>>> there
>>> > >> > >> >>>>>>>>>>>> are some practical limitations to implementing
>>> this. In
>>> > >> > >> >>>>> particular
>>> > >> > >> >>>>>>>>>>>> I'm thinking about this issue
>>> > >> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159
>>> >
>>> > >> with the
>>> > >> > >> >>>>>> built-in
>>> > >> > >> >>>>>>>>>>> signed
>>> > >> > >> >>>>>>>>>>>> number serdes.
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
>>> > >> > >> >>>> negative-negative
>>> > >> > >> >>>>>>>>> queries
>>> > >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
>>> > >> > >> undetectable
>>> > >> > >> >>>>>>>>>>>> incorrect results for negative-positive queries.
>>> For
>>> > >> example,
>>> > >> > >> >>>> say
>>> > >> > >> >>>>>> you
>>> > >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the
>>> Short
>>> > >> > >> serdes.
>>> > >> > >> >>>>> The
>>> > >> > >> >>>>>>>>>>>> serialized bytes for that are
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> from = 1111111111111111
>>> > >> > >> >>>>>>>>>>>> to = 0000000000000001
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> so we would end up flipping those and iterating
>>> over
>>> > >> all keys
>>> > >> > >> >>>>> from
>>> > >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
>>> > >> > >> >>>>> lexicographical
>>> > >> > >> >>>>>>>>>>>> order means we would iterate over every key in
>>> the space
>>> > >> > >> >>>> *except*
>>> > >> > >> >>>>>> for
>>> > >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we
>>> meant to be
>>> > >> > >> >>>> included
>>> > >> > >> >>>>>> in
>>> > >> > >> >>>>>>>>> the
>>> > >> > >> >>>>>>>>>>>> range query.
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> Currently we just log a warning and return an
>>> empty
>>> > >> iterator
>>> > >> > >> >>>> when
>>> > >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but
>>> feels
>>> > >> > >> >>>> slightly
>>> > >> > >> >>>>>> more
>>> > >> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
>>> > >> reverse
>>> > >> > >> >>>>> queries
>>> > >> > >> >>>>>> we
>>> > >> > >> >>>>>>>>>>>> can't even log a warning in this case unless we
>>> wanted
>>> > >> to log
>>> > >> > >> >>>> a
>>> > >> > >> >>>>>>>>> warning
>>> > >> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
>>> > >> usage of a
>>> > >> > >> >>>> new
>>> > >> > >> >>>>>>>>>>>> feature.
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off
>>> the
>>> > >> top of
>>> > >> > >> >>>> my
>>> > >> > >> >>>>>> head
>>> > >> > >> >>>>>>>>> I
>>> > >> > >> >>>>>>>>>>> guess
>>> > >> > >> >>>>>>>>>>>> we could add a store config to enable/disable
>>> automatic
>>> > >> > >> >>>> reverse
>>> > >> > >> >>>>>>>>>>> iteration,
>>> > >> > >> >>>>>>>>>>>> which is off by default?
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> Sophie
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
>>> > >> > >> >>>>> vvcephei@apache.org>
>>> > >> > >> >>>>>>>>>>> wrote:
>>> > >> > >> >>>>>>>>>>>>> Hi there Jorge,
>>> > >> > >> >>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>> Thanks for the KIP!
>>> > >> > >> >>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
>>> > >> > >> >>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but
>>> what do
>>> > >> you
>>> > >> > >> >>>> think
>>> > >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing
>>> people to
>>> > >> flip
>>> > >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading
>>> from "A"
>>> > >> to "Z"
>>> > >> > >> >>>>> would
>>> > >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
>>> > >> backward
>>> > >> > >> >>>> one?
>>> > >> > >> >>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>> Thanks,
>>> > >> > >> >>>>>>>>>>>>> -John
>>> > >> > >> >>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate
>>> wrote:
>>> > >> > >> >>>>>>>>>>>>>> Hi everyone,
>>> > >> > >> >>>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>>> I would like to start the discussion for
>>> KIP-617:
>>> > >> > >> >>>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>
>>> > >> > >> >>>>
>>> > >> > >>
>>> > >>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>>> > >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
>>> > >> > >> >>>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>>> Thanks!
>>> > >> > >> >>>>>>>>>>>>>> Jorge.
>>> > >> > >> >>>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>>>
>>> > >> > >> >>>>>>>>>>>>>> Attachments:
>>> > >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>>
>>> > >> > >> >>>>>>> Attachments:
>>> > >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
>>> > >> > >> >>>>>>
>>> > >> > >> >>>>>
>>> > >> > >> >>>>
>>> > >> > >> >>>
>>> > >> > >> >>
>>> > >> > >>
>>> > >> > >>
>>> > >> >
>>> > >>
>>> > >
>>> >
>>>
>>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by Jorge Esteban Quilcate Otoya <qu...@gmail.com>.
@John: I can see there are some deprecations in there as well. Will update
the KIP.

Thanks,
Jorge


On Thu, Jul 2, 2020 at 3:29 PM Jorge Esteban Quilcate Otoya <
quilcate.jorge@gmail.com> wrote:

> Thanks John.
>
> > it looks like there’s a revision error in which two methods are proposed
> for SessionStore, but seem like they should be in ReadOnlySessionStore. Do
> I read that right?
>
> Yes, I've opted to keep the new methods alongside the existing ones. In
> the case of SessionStore, `findSessions` are in `SessionStore`, and `fetch`
> are in `ReadOnlySessionStore`. If it makes more sense, I can move all of
> them to ReadOnlySessionStore.
> Let me know what you think.
>
> Thanks,
> Jorge.
>
> On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org> wrote:
>
>> Hi Jorge,
>>
>> Thanks for the update. I think this is a good plan.
>>
>> I just took a look at the KIP again, and it looks like there’s a revision
>> error in which two methods are proposed for SessionStore, but seem like
>> they should be in ReadOnlySessionStore. Do I read that right?
>>
>> Otherwise, I’m happy with your proposal.
>>
>> Thanks,
>> John
>>
>> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
>> > Quick update: KIP is updated with latest changes now.
>> > Will leave it open this week while working on the PR.
>> >
>> > Hope to open a new vote thread over the next few days if no additional
>> > feedback is provided.
>> >
>> > Cheers,
>> > Jorge.
>> >
>> > On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
>> > quilcate.jorge@gmail.com> wrote:
>> >
>> > > Thanks, John!
>> > >
>> > > Make sense to reconsider the current approach. I was heading in a
>> similar
>> > > direction while drafting the implementation. Metered, Caching, and
>> other
>> > > layers will also have to get duplicated to build up new methods in
>> `Stores`
>> > > factory, and class casting issues would appear on stores created by
>> DSL.
>> > >
>> > > I will draft a proposal with new methods (move methods from proposed
>> > > interfaces to existing ones) with default implementation in a KIP
>> update
>> > > and wait for Matthias to chime in to validate this approach.
>> > >
>> > > Jorge.
>> > >
>> > >
>> > > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <vv...@apache.org>
>> wrote:
>> > >
>> > >> Hi Jorge,
>> > >>
>> > >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
>> releases.
>> > >>
>> > >> The idea to separate the new methods into "mixin" interfaces seems
>> > >> like a good one, but as we've discovered in KIP-614, it doesn't work
>> > >> out that way in practice. The problem is that the store
>> implementations
>> > >> are just the base layer that get composed with other layers in
>> Streams
>> > >> before they can be accessed in the DSL. This is extremely subtle, so
>> > >> I'm going to put everyone to sleep with a detailed explanation:
>> > >>
>> > >> For example, this is the mechanism by which all KeyValueStore
>> > >> implementations get added to Streams:
>> > >> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
>> > >> return new MeteredKeyValueStore<>(
>> > >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
>> > >>   storeSupplier.metricsScope(),
>> > >>   time,
>> > >>   keySerde,
>> > >>   valueSerde
>> > >> );
>> > >>
>> > >> In the DSL, the store that a processor gets from the context would be
>> > >> the result of this composition. So even if the storeSupplier.get()
>> returns
>> > >> a store that implements the "reverse" interface, when you try to use
>> it
>> > >> from a processor like:
>> > >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
>> > >> ReadOnlyBackwardWindowStore<K, V> store =
>> > >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
>> > >>
>> > >> You'd just get a ClassCastException because it's actually a
>> > >> MeteredKeyValueStore, which doesn't implement
>> > >> ReadOnlyBackwardWindowStore.
>> > >>
>> > >> The only way to make this work would be to make the Metered,
>> > >> Caching, and Logging layers also implement the new interfaces,
>> > >> but this effectively forces implementations to also implement
>> > >> the interface. Otherwise, the intermediate layers would have to
>> > >> cast the store in each method, like this:
>> > >> MeteredWindowStore#backwardFetch {
>> > >>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
>> > >> }
>> > >>
>> > >> And then if the implementation doesn't "opt in" by implementing
>> > >> the interface, you'd get a ClassCastException, not when you get the
>> > >> store, but when you try to use it.
>> > >>
>> > >> The fact that we get ClassCastExceptions no matter which way we
>> > >> turn here indicates that we're really not getting any benefit from
>> the
>> > >> type system, which makes the extra interfaces seem not worth all the
>> > >> code involved.
>> > >>
>> > >> Where we landed in KIP-614 is that, unless we want to completely
>> > >> revamp the way that StateStores work in the DSL, you might as
>> > >> well just add the new methods to the existing interfaces. To prevent
>> > >> compilation errors, we can add default implementations that throw
>> > >> UnsupportedOperationException. If a store doesn't opt in by
>> > >> implementing the methods, you'd get an UnsupportedOperationException,
>> > >> which seems no worse, and maybe better, than the ClassCastException
>> > >> you'd get if we go with the "mixin interface" approach.
>> > >>
>> > >> A quick note: This entire discussion focuses on the DSL. If you're
>> just
>> > >> using the Processor API by directly adding the a custom store to the
>> > >> Topology:
>> > >> org.apache.kafka.streams.Topology#addStateStore
>> > >> and then retrieving it in the processor via:
>> > >> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
>> > >> in
>> > >> org.apache.kafka.streams.processor.Processor#init
>> > >>
>> > >> Then, you can both register and retrieve _any_ StateStore
>> implementation.
>> > >> There's no need to use KeyValueStore or any other built-in interface.
>> > >> In other words, KeyValueStore and company are only part of the DSL,
>> > >> not the PAPI. So, discussions about the build-in store interfaces
>> are only
>> > >> really relevant in the context of the DSL, Transformers, and
>> Materialized.
>> > >>
>> > >> So, in conclusion, I'd really recommend just adding any new methods
>> to
>> > >> the existing store interfaces. We might be able to revamp the API in
>> the
>> > >> future to support mixins, but it's a much larger scope of work than
>> this
>> > >> KIP.
>> > >> A more minor comment is that we don't need to add Deprecated variants
>> > >> of new methods.
>> > >>
>> > >> Thanks again, and once again, I'm sorry I tuned out and didn't offer
>> this
>> > >> feedback before you revised the KIP.
>> > >> -John
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
>> > >> > Hi everyone,
>> > >> >
>> > >> > I've updated the KIP, applying Matthias' feedback regarding
>> interface
>> > >> > hierarchy.
>> > >> >
>> > >> > Also, following the last email, I think we can consider reverse
>> > >> operations
>> > >> > on KeyValue range as well, as implementation supports lexicographic
>> > >> order.
>> > >> >
>> > >> > I considered different naming between Key-based ranges and
>> Time-based
>> > >> > ranges, and mitigate confusion when fetching keys and time ranges
>> as
>> > >> > WindowStore does:
>> > >> >
>> > >> > Key-based ranges: reverseRange(), reverseAll()
>> > >> > Time-based ranges: backwardFetch()
>> > >> >
>> > >> > Then, key-based changes apply to KeyValueStore, and time-based
>> changes
>> > >> to
>> > >> > Window and Session stores.
>> > >> >
>> > >> > Let me know if you have any questions.
>> > >> >
>> > >> > Thanks,
>> > >> > Jorge.
>> > >> >
>> > >> >
>> > >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
>> > >> > quilcate.jorge@gmail.com> wrote:
>> > >> >
>> > >> > > Hi everyone, sorry for the late reply.
>> > >> > >
>> > >> > > Thanks Matthias for your feedback. I think it makes sense to
>> > >> reconsider
>> > >> > > the current design based on your input.
>> > >> > >
>> > >> > > After digging deeper into the current implementation, I'd like to
>> > >> bring my
>> > >> > > current understanding to be double-checked as it might be
>> redefining
>> > >> the
>> > >> > > KIP's scope:
>> > >> > >
>> > >> > > 1. There are 2 ranges been exposed by different stores:
>> > >> > >
>> > >> > > a. Key Range
>> > >> > > b. Timestamp Range
>> > >> > >
>> > >> > > So far, we have discussed covering both.
>> > >> > >
>> > >> > > 2. Key Range functions do not provide ordering guarantees by
>> design:
>> > >> > >
>> > >> > > ```ReadOnlyKeyValueStore.java
>> > >> > >     /**
>> > >> > >      * Get an iterator over a given range of keys. This iterator
>> must
>> > >> be
>> > >> > > closed after use.
>> > >> > >      * The returned iterator must be safe from {@link
>> > >> > > java.util.ConcurrentModificationException}s
>> > >> > >      * and must not return null values. No ordering guarantees
>> are
>> > >> > > provided.
>> > >> > >      * ...
>> > >> > >      */
>> > >> > >      KeyValueIterator<K, V> range(K from, K to);
>> > >> > > ```
>> > >> > >
>> > >> > > Therefore, I'd propose removing Key range operations from the
>> scope.
>> > >> > >
>> > >> > > 3. Timestamp Range operations happen at the SegmentsStore level
>> > >> (internal)
>> > >> > > API
>> > >> > >
>> > >> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
>> > >> > >
>> > >> > > I'd propose extending `Segments#segments(timeFrom, timeTo,
>> backwards)`
>> > >> > > with a flag for backwards operations.
>> > >> > >
>> > >> > > As segments returned will be processed backwards, I'm not
>> extending
>> > >> > > KeyValueStores to query each segment backwards as previous point
>> 2.
>> > >> > >
>> > >> > > 4. Extend WindowStores implementations with a new
>> > >> > > WindowBackwardStore/ReadOnlyBackwardStore:
>> > >> > >
>> > >> > > ```java
>> > >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
>> > >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from,
>> Instant
>> > >> to)
>> > >> > > throws IllegalArgumentException;
>> > >> > >
>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
>> > >> Instant
>> > >> > > fromTime, Instant toTime)
>> > >> > >         throws IllegalArgumentException;
>> > >> > >
>> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant
>> from,
>> > >> > > Instant to) throws IllegalArgumentException;
>> > >> > > ```
>> > >> > >
>> > >> > > 5. SessionStore is a bit different as it has fetch/find sessions
>> > >> spread
>> > >> > > between SessionStore and ReadOnlySessionStore.
>> > >> > >
>> > >> > > I'd propose a new interface `SessionBackwardStore` to expose
>> backward
>> > >> find
>> > >> > > operations:
>> > >> > >
>> > >> > > ```java
>> > >> > > public interface SessionBackwardStore<K, AGG> {
>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
>> backwardFindSessions(final K
>> > >> key,
>> > >> > > final long earliestSessionEndTime, final long
>> latestSessionStartTime);
>> > >> > >
>> > >> > >     KeyValueIterator<Windowed<K>, AGG>
>> backwardFindSessions(final K
>> > >> > > keyFrom, final K keyTo, final long earliestSessionEndTime, final
>> long
>> > >> > > latestSessionStartTime);
>> > >> > > }
>> > >> > > ```
>> > >> > >
>> > >> > > If this understanding is correct I'll proceed to update the KIP
>> based
>> > >> on
>> > >> > > this.
>> > >> > >
>> > >> > > Looking forward to your feedback.
>> > >> > >
>> > >> > > Thanks,
>> > >> > > Jorge.
>> > >> > >
>> > >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <
>> mjsax@apache.org>
>> > >> wrote:
>> > >> > >
>> > >> > >> Hey,
>> > >> > >>
>> > >> > >> Sorry that I am late to the game. I am not 100% convinced about
>> the
>> > >> > >> current proposal. Using a new config as feature flag seems to be
>> > >> rather
>> > >> > >> "nasty" to me, and flipping from/to is a little bit too fancy
>> for my
>> > >> > >> personal taste.
>> > >> > >>
>> > >> > >> I agree, that the original proposal using a "ReadDirection"
>> enum is
>> > >> not
>> > >> > >> ideal either.
>> > >> > >>
>> > >> > >> Thus, I would like to put out a new idea: We could add a new
>> > >> interface
>> > >> > >> that offers new methods that return revers iterators.
>> > >> > >>
>> > >> > >> The KIP already proposes to add `reverseAll()` and it seems
>> backward
>> > >> > >> incompatible to just add this method to `ReadOnlyKeyValueStore`
>> and
>> > >> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
>> > >> default
>> > >> > >> implementation for custom stores and thus either break
>> compatibility
>> > >> or
>> > >> > >> need add a default that just throws an exception. Neither seems
>> to
>> > >> be a
>> > >> > >> good option.
>> > >> > >>
>> > >> > >> Using a new interface avoid this issue and allows users
>> implementing
>> > >> > >> custom stores to opt-in by adding the interface to their stores.
>> > >> > >> Furthermore, we don't need any config. In the end, we
>> encapsulte the
>> > >> > >> change into the store, and our runtime is agnostic to it (as it
>> > >> should
>> > >> > >> be).
>> > >> > >>
>> > >> > >> The hierarchy becomes a little complex (but uses would not
>> really see
>> > >> > >> the complexity):
>> > >> > >>
>> > >> > >> // exsiting
>> > >> > >> ReadOnlyKeyValueStore
>> > >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
>> > >> > >>
>> > >> > >>
>> > >> > >> // helper interface; users don't care
>> > >> > >> // need similar ones for other stores
>> > >> > >> ReverseReadOnlyKeyValueStore {
>> > >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
>> > >> > >>     KeyValueIterator<K, V> reverseAll();
>> > >> > >> }
>> > >> > >>
>> > >> > >>
>> > >> > >> // two new user facing interfaces for kv-store
>> > >> > >> // need similar ones for other stores
>> > >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
>> > >> ReadOnlyKeyValueStore,
>> > >> > >> ReverseReadOnlyKeyValueStore
>> > >> > >>
>> > >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
>> > >> > >> ReverseReadOnlyKeyValueStore
>> > >> > >>
>> > >> > >>
>> > >> > >> // updated (also internal)
>> > >> > >> // also need to update other built-in stores
>> > >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
>> > >> BulkLoadingStore
>> > >> > >>
>> > >> > >>
>> > >> > >> In the end, user would only care about the two (for kv-case) new
>> > >> > >> interface that offer revers iterator (read only and regular)
>> and can
>> > >> > >> cast stores accordingly in their Processors/Transformers or via
>> IQ.
>> > >> > >>
>> > >> > >>
>> > >> > >> Btw: if we add revers iterator for KeyValue and Window store,
>> should
>> > >> we
>> > >> > >> do the same for Session store?
>> > >> > >>
>> > >> > >>
>> > >> > >>
>> > >> > >> This might be more code to write, but I believe it provides the
>> > >> better
>> > >> > >> user experience. Thoughts?
>> > >> > >>
>> > >> > >>
>> > >> > >>
>> > >> > >> -Matthias
>> > >> > >>
>> > >> > >>
>> > >> > >>
>> > >> > >>
>> > >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
>> > >> > >> > Sorry for my silence, Jorge,
>> > >> > >> >
>> > >> > >> > I've just taken another look, and I'm personally happy with
>> the
>> > >> KIP.
>> > >> > >> >
>> > >> > >> > Thanks,
>> > >> > >> > -John
>> > >> > >> >
>> > >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya
>> wrote:
>> > >> > >> >> If no additional comments, I will proceed to start the a vote
>> > >> thread.
>> > >> > >> >>
>> > >> > >> >> Thanks a lot for your feedback!
>> > >> > >> >>
>> > >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya
>> <
>> > >> > >> >> quilcate.jorge@gmail.com> wrote:
>> > >> > >> >>
>> > >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
>> > >> > >> >>>
>> > >> > >> >>> I updated the KIP with your feedback.
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>>
>> > >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
>> > >> > >> sophie@confluent.io>
>> > >> > >> >>> wrote:
>> > >> > >> >>>
>> > >> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in
>> the
>> > >> case of
>> > >> > >> >>>> variable-length keys.
>> > >> > >> >>>>
>> > >> > >> >>>> In the case of fixed-length keys, you can just compute the
>> keys
>> > >> that
>> > >> > >> >>>> correspond
>> > >> > >> >>>> to the maximum and minimum serialized bytes, then perform a
>> > >> `range()`
>> > >> > >> >>>> query
>> > >> > >> >>>> instead of an `all()`. If your keys don't have a
>> well-defined
>> > >> > >> ordering
>> > >> > >> >>>> such
>> > >> > >> >>>> that
>> > >> > >> >>>> you can't determine the MAX_KEY, then you probably don't
>> care
>> > >> about
>> > >> > >> the
>> > >> > >> >>>> iterator order anyway.
>> > >> > >> >>>>
>> > >> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If
>> all your
>> > >> > >> keys were
>> > >> > >> >>>> just
>> > >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the
>> only
>> > >> way
>> > >> > >> to
>> > >> > >> >>>> figure out the
>> > >> > >> >>>> maximum key in the store is by using `all()` -- and
>> without a
>> > >> reverse
>> > >> > >> >>>> iterator, you're
>> > >> > >> >>>> doomed to iterate through every single key just to answer
>> that
>> > >> simple
>> > >> > >> >>>> question.
>> > >> > >> >>>>
>> > >> > >> >>>> That said, I still think determining the iterator order
>> based
>> > >> on the
>> > >> > >> >>>> to/from bytes
>> > >> > >> >>>> makes a lot of intuitive sense and gives the API a nice
>> > >> symmetry.
>> > >> > >> What if
>> > >> > >> >>>> we
>> > >> > >> >>>> solved the `all()` problem by just giving `all()` a reverse
>> > >> form to
>> > >> > >> >>>> complement it?
>> > >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something
>> to
>> > >> that
>> > >> > >> effect.
>> > >> > >> >>>>
>> > >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
>> Otoya <
>> > >> > >> >>>> quilcate.jorge@gmail.com> wrote:
>> > >> > >> >>>>
>> > >> > >> >>>>> Thanks John.
>> > >> > >> >>>>>
>> > >> > >> >>>>> Agree. I like the first approach as well, with
>> StreamsConfig
>> > >> flag
>> > >> > >> >>>> passing
>> > >> > >> >>>>> by via ProcessorContext.
>> > >> > >> >>>>>
>> > >> > >> >>>>> Another positive effect with "reverse parameters" is that
>> in
>> > >> the
>> > >> > >> case of
>> > >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide
>> > >> _which_
>> > >> > >> pair
>> > >> > >> >>>> to
>> > >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
>> > >> > >> >>>>>
>> > >> > >> >>>>> The only issue I've found while reviewing the KIP is that
>> > >> `all()`
>> > >> > >> won't
>> > >> > >> >>>> fit
>> > >> > >> >>>>> within this approach.
>> > >> > >> >>>>>
>> > >> > >> >>>>> We could remove it from the KIP and argue that for
>> WindowStore,
>> > >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in
>> reverse
>> > >> > >> order,
>> > >> > >> >>>> and
>> > >> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
>> > >> > >> >>>>>
>> > >> > >> >>>>> If there is consensus with this changes, I will go and
>> update
>> > >> the
>> > >> > >> KIP.
>> > >> > >> >>>>>
>> > >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
>> > >> vvcephei@apache.org>
>> > >> > >> >>>> wrote:
>> > >> > >> >>>>>
>> > >> > >> >>>>>> Hi Jorge,
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would
>> protect
>> > >> anyone
>> > >> > >> >>>>>> who may be depending on the current behavior.
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> It seems better to locate the feature flag in the
>> > >> initialization
>> > >> > >> >>>> logic of
>> > >> > >> >>>>>> the store, rather than have a method on the "live" store
>> that
>> > >> > >> changes
>> > >> > >> >>>>>> its behavior on the fly.
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> It seems like there are two options here, one is to add
>> a new
>> > >> > >> config:
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
>> > >> > >> >>>>>>   "enable.backwards.iteration
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> Or we can add a feature flag in Materialized, like
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> Materialized.enableBackwardsIteration()
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> I think I'd personally lean toward the config, for the
>> > >> following
>> > >> > >> >>>> reason.
>> > >> > >> >>>>>> The concern that Sophie raised is that someone's program
>> may
>> > >> depend
>> > >> > >> >>>>>> on the existing contract of getting an empty iterator. We
>> > >> don't
>> > >> > >> want
>> > >> > >> >>>> to
>> > >> > >> >>>>>> switch behavior when they aren't expecting it, so we
>> provide
>> > >> them a
>> > >> > >> >>>>>> config to assert that they _are_ expecting the new
>> behavior,
>> > >> which
>> > >> > >> >>>>>> means they take responsibility for updating their code to
>> > >> expect
>> > >> > >> the
>> > >> > >> >>>> new
>> > >> > >> >>>>>> behavior.
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
>> > >> behaviors
>> > >> > >> on a
>> > >> > >> >>>>>> per-query, or per-store basis. We just want people to be
>> not
>> > >> > >> surprised
>> > >> > >> >>>>>> by this change in general.
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> What do you think?
>> > >> > >> >>>>>> Thanks,
>> > >> > >> >>>>>> -John
>> > >> > >> >>>>>>
>> > >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
>> > >> > >> >>>>>>> Thank you both for the great feedback.
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the
>> need
>> > >> for
>> > >> > >> >>>>>>> additional API methods. And with a feature flag on
>> > >> `StateStore`,
>> > >> > >> >>>>>>> disabled by default, should no break current users.
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>> The only side-effect I can think of is that: by moving
>> the
>> > >> flag
>> > >> > >> >>>>> upwards,
>> > >> > >> >>>>>>> all later operations become affected; which might be ok
>> for
>> > >> most
>> > >> > >> >>>> (all?)
>> > >> > >> >>>>>>> cases. I can't think of an scenario where this would be
>> an
>> > >> issue,
>> > >> > >> >>>> just
>> > >> > >> >>>>>>> want to point this out.
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>> If moving to this approach, I'd like to check if I got
>> this
>> > >> right
>> > >> > >> >>>>> before
>> > >> > >> >>>>>>> updating the KIP:
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>> - only `StateStore` will change by having a new method:
>> > >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep things
>> > >> > >> compatible.
>> > >> > >> >>>>>>> - then all `*Stores` will have to update their
>> implementation
>> > >> > >> based
>> > >> > >> >>>> on
>> > >> > >> >>>>>>> this flag.
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
>> > >> > >> >>>>>>>>> There's no possibility that someone could be relying
>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
>> because
>> > >> that's
>> > >> > >> >>>> not
>> > >> > >> >>>>>> what
>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
>> getting
>> > >> an
>> > >> > >> >>>> empty
>> > >> > >> >>>>>>>> iterator
>> > >> > >> >>>>>>>>
>> > >> > >> >>>>>>>> I just meant that they might be relying on the
>> assumption
>> > >> that
>> > >> > >> the
>> > >> > >> >>>>>> range
>> > >> > >> >>>>>>>> query
>> > >> > >> >>>>>>>> will never return results with decreasing keys. The
>> empty
>> > >> > >> iterator
>> > >> > >> >>>>>> wouldn't
>> > >> > >> >>>>>>>> break that contract, but of course a surprise reverse
>> > >> iterator
>> > >> > >> >>>> would.
>> > >> > >> >>>>>>>>
>> > >> > >> >>>>>>>> FWIW I actually am in favor of automatically
>> converting to a
>> > >> > >> >>>> reverse
>> > >> > >> >>>>>>>> iterator,
>> > >> > >> >>>>>>>> I just thought we should consider whether this should
>> be
>> > >> off by
>> > >> > >> >>>>>> default or
>> > >> > >> >>>>>>>> even possible to disable at all.
>> > >> > >> >>>>>>>>
>> > >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
>> > >> > >> vvcephei@apache.org
>> > >> > >> >>>>>
>> > >> > >> >>>>>> wrote:
>> > >> > >> >>>>>>>>
>> > >> > >> >>>>>>>>> Thanks for the response, Sophie,
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>>>> I wholeheartedly agree we should take as much into
>> account
>> > >> as
>> > >> > >> >>>>> possible
>> > >> > >> >>>>>>>>> up front, rather than regretting our decisions later.
>> I
>> > >> actually
>> > >> > >> >>>> do
>> > >> > >> >>>>>> share
>> > >> > >> >>>>>>>>> your vague sense of worry, which was what led me to
>> say
>> > >> > >> initially
>> > >> > >> >>>>>> that I
>> > >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
>> > >> Sometimes, it's
>> > >> > >> >>>>>> better
>> > >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
>> > >> people
>> > >> > >> >>>> will be
>> > >> > >> >>>>>>>>> confused
>> > >> > >> >>>>>>>>> than not.
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>>>> I really don't think that there's any danger of
>> "relying
>> > >> on a
>> > >> > >> >>>> bug"
>> > >> > >> >>>>>> here,
>> > >> > >> >>>>>>>>> although
>> > >> > >> >>>>>>>>> people certainly could be relying on current
>> behavior. One
>> > >> thing
>> > >> > >> >>>> to
>> > >> > >> >>>>> be
>> > >> > >> >>>>>>>>> clear
>> > >> > >> >>>>>>>>> about (which I just left a more detailed comment in
>> > >> KAFKA-8159
>> > >> > >> >>>>> about)
>> > >> > >> >>>>>> is
>> > >> > >> >>>>>>>>> that
>> > >> > >> >>>>>>>>> when we say something like key1 > key2, this ordering
>> is
>> > >> defined
>> > >> > >> >>>> by
>> > >> > >> >>>>>> the
>> > >> > >> >>>>>>>>> serde's output and nothing else.
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>>>> Currently, thanks to your fix in
>> > >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
>> > >> > >> >>>>>>>>> ,
>> > >> > >> >>>>>>>>> the store contract is that for range scans, if from >
>> to,
>> > >> then
>> > >> > >> >>>> the
>> > >> > >> >>>>>> store
>> > >> > >> >>>>>>>>> must
>> > >> > >> >>>>>>>>> return an empty iterator. There's no possibility that
>> > >> someone
>> > >> > >> >>>> could
>> > >> > >> >>>>> be
>> > >> > >> >>>>>>>>> relying
>> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
>> because
>> > >> that's
>> > >> > >> >>>> not
>> > >> > >> >>>>>> what
>> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
>> getting
>> > >> an
>> > >> > >> >>>> empty
>> > >> > >> >>>>>>>>> iterator.
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>>>> My counterproposal was to actually change this
>> contract to
>> > >> say
>> > >> > >> >>>> that
>> > >> > >> >>>>>> the
>> > >> > >> >>>>>>>>> store
>> > >> > >> >>>>>>>>> must return an iterator over the keys in that range,
>> but
>> > >> in the
>> > >> > >> >>>>>> reverse
>> > >> > >> >>>>>>>>> order.
>> > >> > >> >>>>>>>>> So, in addition to considering whether this idea is
>> "too
>> > >> fancy"
>> > >> > >> >>>> (aka
>> > >> > >> >>>>>>>>> confusing),
>> > >> > >> >>>>>>>>> we should also consider the likelihood of breaking an
>> > >> existing
>> > >> > >> >>>>>> program with
>> > >> > >> >>>>>>>>> this behavior/contract change.
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
>> > >> strongly in
>> > >> > >> >>>>> favor
>> > >> > >> >>>>>> of my
>> > >> > >> >>>>>>>>> proposal. I just wanted to present it for
>> consideration
>> > >> > >> alongside
>> > >> > >> >>>>>> Jorge's
>> > >> > >> >>>>>>>>> original one.
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>>>> Thanks for raising these very good points,
>> > >> > >> >>>>>>>>> -John
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman
>> wrote:
>> > >> > >> >>>>>>>>>>> Rather than working around it, I think we should
>> just
>> > >> fix it
>> > >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
>> > >> > >> >>>>>>>>>>
>> > >> > >> >>>>>>>>>> That was my primary concern, although I do have a
>> vague
>> > >> sense
>> > >> > >> of
>> > >> > >> >>>>>> worry
>> > >> > >> >>>>>>>>>> that we might be allowing users to get into trouble
>> > >> without
>> > >> > >> >>>>>> realizing it.
>> > >> > >> >>>>>>>>>> For example if their custom serdes suffer a similar
>> bug
>> > >> as the
>> > >> > >> >>>>> above,
>> > >> > >> >>>>>>>>>> and/or
>> > >> > >> >>>>>>>>>> they rely on getting results in increasing order (of
>> the
>> > >> keys)
>> > >> > >> >>>> even
>> > >> > >> >>>>>> when
>> > >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the
>> > >> range
>> > >> > >> >>>> query
>> > >> > >> >>>>>> returns
>> > >> > >> >>>>>>>>>> nothing in that case.
>> > >> > >> >>>>>>>>>>
>> > >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or
>> not,
>> > >> but in
>> > >> > >> >>>> that
>> > >> > >> >>>>>> past
>> > >> > >> >>>>>>>>>> we've
>> > >> > >> >>>>>>>>>> taken the stance that we should not break
>> compatibility
>> > >> even if
>> > >> > >> >>>> the
>> > >> > >> >>>>>> user
>> > >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
>> > >> > >> >>>>>>>>>>
>> > >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against
>> this
>> > >> > >> >>>> proposal,
>> > >> > >> >>>>>> just
>> > >> > >> >>>>>>>>>> laying
>> > >> > >> >>>>>>>>>> out some considerations we should take into account.
>> At
>> > >> the end
>> > >> > >> >>>> of
>> > >> > >> >>>>>> the
>> > >> > >> >>>>>>>>> day
>> > >> > >> >>>>>>>>>> we should do what's right rather than maintain
>> > >> compatibility
>> > >> > >> >>>> with
>> > >> > >> >>>>>>>>> existing
>> > >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle
>> ground.
>> > >> > >> >>>>>>>>>>
>> > >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
>> > >> > >> >>>> vvcephei@apache.org>
>> > >> > >> >>>>>>>>> wrote:
>> > >> > >> >>>>>>>>>>> Thanks Sophie,
>> > >> > >> >>>>>>>>>>>
>> > >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working
>> around it,
>> > >> I
>> > >> > >> >>>> think
>> > >> > >> >>>>> we
>> > >> > >> >>>>>>>>> should
>> > >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
>> > >> > >> >>>>>>>>>>>
>> > >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern
>> > >> that some
>> > >> > >> >>>>>> serdes
>> > >> > >> >>>>>>>>>>> might be incorrectly written.
>> > >> > >> >>>>>>>>>>>
>> > >> > >> >>>>>>>>>>> Were there other practical concerns that you had in
>> mind?
>> > >> > >> >>>>>>>>>>>
>> > >> > >> >>>>>>>>>>> Thanks,
>> > >> > >> >>>>>>>>>>> John
>> > >> > >> >>>>>>>>>>>
>> > >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
>> > >> wrote:
>> > >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from
>> bytes
>> > >> but I
>> > >> > >> >>>>> think
>> > >> > >> >>>>>>>>> there
>> > >> > >> >>>>>>>>>>>> are some practical limitations to implementing
>> this. In
>> > >> > >> >>>>> particular
>> > >> > >> >>>>>>>>>>>> I'm thinking about this issue
>> > >> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159>
>> > >> with the
>> > >> > >> >>>>>> built-in
>> > >> > >> >>>>>>>>>>> signed
>> > >> > >> >>>>>>>>>>>> number serdes.
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
>> > >> > >> >>>> negative-negative
>> > >> > >> >>>>>>>>> queries
>> > >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
>> > >> > >> undetectable
>> > >> > >> >>>>>>>>>>>> incorrect results for negative-positive queries.
>> For
>> > >> example,
>> > >> > >> >>>> say
>> > >> > >> >>>>>> you
>> > >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the
>> Short
>> > >> > >> serdes.
>> > >> > >> >>>>> The
>> > >> > >> >>>>>>>>>>>> serialized bytes for that are
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> from = 1111111111111111
>> > >> > >> >>>>>>>>>>>> to = 0000000000000001
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> so we would end up flipping those and iterating
>> over
>> > >> all keys
>> > >> > >> >>>>> from
>> > >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
>> > >> > >> >>>>> lexicographical
>> > >> > >> >>>>>>>>>>>> order means we would iterate over every key in the
>> space
>> > >> > >> >>>> *except*
>> > >> > >> >>>>>> for
>> > >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant
>> to be
>> > >> > >> >>>> included
>> > >> > >> >>>>>> in
>> > >> > >> >>>>>>>>> the
>> > >> > >> >>>>>>>>>>>> range query.
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> Currently we just log a warning and return an empty
>> > >> iterator
>> > >> > >> >>>> when
>> > >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but
>> feels
>> > >> > >> >>>> slightly
>> > >> > >> >>>>>> more
>> > >> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
>> > >> reverse
>> > >> > >> >>>>> queries
>> > >> > >> >>>>>> we
>> > >> > >> >>>>>>>>>>>> can't even log a warning in this case unless we
>> wanted
>> > >> to log
>> > >> > >> >>>> a
>> > >> > >> >>>>>>>>> warning
>> > >> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
>> > >> usage of a
>> > >> > >> >>>> new
>> > >> > >> >>>>>>>>>>>> feature.
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off
>> the
>> > >> top of
>> > >> > >> >>>> my
>> > >> > >> >>>>>> head
>> > >> > >> >>>>>>>>> I
>> > >> > >> >>>>>>>>>>> guess
>> > >> > >> >>>>>>>>>>>> we could add a store config to enable/disable
>> automatic
>> > >> > >> >>>> reverse
>> > >> > >> >>>>>>>>>>> iteration,
>> > >> > >> >>>>>>>>>>>> which is off by default?
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> Sophie
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
>> > >> > >> >>>>> vvcephei@apache.org>
>> > >> > >> >>>>>>>>>>> wrote:
>> > >> > >> >>>>>>>>>>>>> Hi there Jorge,
>> > >> > >> >>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>> Thanks for the KIP!
>> > >> > >> >>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
>> > >> > >> >>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but
>> what do
>> > >> you
>> > >> > >> >>>> think
>> > >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing
>> people to
>> > >> flip
>> > >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from
>> "A"
>> > >> to "Z"
>> > >> > >> >>>>> would
>> > >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
>> > >> backward
>> > >> > >> >>>> one?
>> > >> > >> >>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>> Thanks,
>> > >> > >> >>>>>>>>>>>>> -John
>> > >> > >> >>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate
>> wrote:
>> > >> > >> >>>>>>>>>>>>>> Hi everyone,
>> > >> > >> >>>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617:
>> > >> > >> >>>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>
>> > >> > >> >>>>>>
>> > >> > >> >>>>>
>> > >> > >> >>>>
>> > >> > >>
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
>> > >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
>> > >> > >> >>>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>>> Thanks!
>> > >> > >> >>>>>>>>>>>>>> Jorge.
>> > >> > >> >>>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>>>
>> > >> > >> >>>>>>>>>>>>>> Attachments:
>> > >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>>
>> > >> > >> >>>>>>> Attachments:
>> > >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
>> > >> > >> >>>>>>
>> > >> > >> >>>>>
>> > >> > >> >>>>
>> > >> > >> >>>
>> > >> > >> >>
>> > >> > >>
>> > >> > >>
>> > >> >
>> > >>
>> > >
>> >
>>
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by Jorge Esteban Quilcate Otoya <qu...@gmail.com>.
Thanks John.

> it looks like there’s a revision error in which two methods are proposed
for SessionStore, but seem like they should be in ReadOnlySessionStore. Do
I read that right?

Yes, I've opted to keep the new methods alongside the existing ones. In the
case of SessionStore, `findSessions` are in `SessionStore`, and `fetch` are
in `ReadOnlySessionStore`. If it makes more sense, I can move all of them
to ReadOnlySessionStore.
Let me know what you think.

Thanks,
Jorge.

On Thu, Jul 2, 2020 at 2:36 PM John Roesler <vv...@apache.org> wrote:

> Hi Jorge,
>
> Thanks for the update. I think this is a good plan.
>
> I just took a look at the KIP again, and it looks like there’s a revision
> error in which two methods are proposed for SessionStore, but seem like
> they should be in ReadOnlySessionStore. Do I read that right?
>
> Otherwise, I’m happy with your proposal.
>
> Thanks,
> John
>
> On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
> > Quick update: KIP is updated with latest changes now.
> > Will leave it open this week while working on the PR.
> >
> > Hope to open a new vote thread over the next few days if no additional
> > feedback is provided.
> >
> > Cheers,
> > Jorge.
> >
> > On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
> > quilcate.jorge@gmail.com> wrote:
> >
> > > Thanks, John!
> > >
> > > Make sense to reconsider the current approach. I was heading in a
> similar
> > > direction while drafting the implementation. Metered, Caching, and
> other
> > > layers will also have to get duplicated to build up new methods in
> `Stores`
> > > factory, and class casting issues would appear on stores created by
> DSL.
> > >
> > > I will draft a proposal with new methods (move methods from proposed
> > > interfaces to existing ones) with default implementation in a KIP
> update
> > > and wait for Matthias to chime in to validate this approach.
> > >
> > > Jorge.
> > >
> > >
> > > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <vv...@apache.org>
> wrote:
> > >
> > >> Hi Jorge,
> > >>
> > >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1
> releases.
> > >>
> > >> The idea to separate the new methods into "mixin" interfaces seems
> > >> like a good one, but as we've discovered in KIP-614, it doesn't work
> > >> out that way in practice. The problem is that the store
> implementations
> > >> are just the base layer that get composed with other layers in Streams
> > >> before they can be accessed in the DSL. This is extremely subtle, so
> > >> I'm going to put everyone to sleep with a detailed explanation:
> > >>
> > >> For example, this is the mechanism by which all KeyValueStore
> > >> implementations get added to Streams:
> > >> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
> > >> return new MeteredKeyValueStore<>(
> > >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
> > >>   storeSupplier.metricsScope(),
> > >>   time,
> > >>   keySerde,
> > >>   valueSerde
> > >> );
> > >>
> > >> In the DSL, the store that a processor gets from the context would be
> > >> the result of this composition. So even if the storeSupplier.get()
> returns
> > >> a store that implements the "reverse" interface, when you try to use
> it
> > >> from a processor like:
> > >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> > >> ReadOnlyBackwardWindowStore<K, V> store =
> > >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
> > >>
> > >> You'd just get a ClassCastException because it's actually a
> > >> MeteredKeyValueStore, which doesn't implement
> > >> ReadOnlyBackwardWindowStore.
> > >>
> > >> The only way to make this work would be to make the Metered,
> > >> Caching, and Logging layers also implement the new interfaces,
> > >> but this effectively forces implementations to also implement
> > >> the interface. Otherwise, the intermediate layers would have to
> > >> cast the store in each method, like this:
> > >> MeteredWindowStore#backwardFetch {
> > >>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
> > >> }
> > >>
> > >> And then if the implementation doesn't "opt in" by implementing
> > >> the interface, you'd get a ClassCastException, not when you get the
> > >> store, but when you try to use it.
> > >>
> > >> The fact that we get ClassCastExceptions no matter which way we
> > >> turn here indicates that we're really not getting any benefit from the
> > >> type system, which makes the extra interfaces seem not worth all the
> > >> code involved.
> > >>
> > >> Where we landed in KIP-614 is that, unless we want to completely
> > >> revamp the way that StateStores work in the DSL, you might as
> > >> well just add the new methods to the existing interfaces. To prevent
> > >> compilation errors, we can add default implementations that throw
> > >> UnsupportedOperationException. If a store doesn't opt in by
> > >> implementing the methods, you'd get an UnsupportedOperationException,
> > >> which seems no worse, and maybe better, than the ClassCastException
> > >> you'd get if we go with the "mixin interface" approach.
> > >>
> > >> A quick note: This entire discussion focuses on the DSL. If you're
> just
> > >> using the Processor API by directly adding the a custom store to the
> > >> Topology:
> > >> org.apache.kafka.streams.Topology#addStateStore
> > >> and then retrieving it in the processor via:
> > >> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
> > >> in
> > >> org.apache.kafka.streams.processor.Processor#init
> > >>
> > >> Then, you can both register and retrieve _any_ StateStore
> implementation.
> > >> There's no need to use KeyValueStore or any other built-in interface.
> > >> In other words, KeyValueStore and company are only part of the DSL,
> > >> not the PAPI. So, discussions about the build-in store interfaces are
> only
> > >> really relevant in the context of the DSL, Transformers, and
> Materialized.
> > >>
> > >> So, in conclusion, I'd really recommend just adding any new methods to
> > >> the existing store interfaces. We might be able to revamp the API in
> the
> > >> future to support mixins, but it's a much larger scope of work than
> this
> > >> KIP.
> > >> A more minor comment is that we don't need to add Deprecated variants
> > >> of new methods.
> > >>
> > >> Thanks again, and once again, I'm sorry I tuned out and didn't offer
> this
> > >> feedback before you revised the KIP.
> > >> -John
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
> > >> > Hi everyone,
> > >> >
> > >> > I've updated the KIP, applying Matthias' feedback regarding
> interface
> > >> > hierarchy.
> > >> >
> > >> > Also, following the last email, I think we can consider reverse
> > >> operations
> > >> > on KeyValue range as well, as implementation supports lexicographic
> > >> order.
> > >> >
> > >> > I considered different naming between Key-based ranges and
> Time-based
> > >> > ranges, and mitigate confusion when fetching keys and time ranges as
> > >> > WindowStore does:
> > >> >
> > >> > Key-based ranges: reverseRange(), reverseAll()
> > >> > Time-based ranges: backwardFetch()
> > >> >
> > >> > Then, key-based changes apply to KeyValueStore, and time-based
> changes
> > >> to
> > >> > Window and Session stores.
> > >> >
> > >> > Let me know if you have any questions.
> > >> >
> > >> > Thanks,
> > >> > Jorge.
> > >> >
> > >> >
> > >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
> > >> > quilcate.jorge@gmail.com> wrote:
> > >> >
> > >> > > Hi everyone, sorry for the late reply.
> > >> > >
> > >> > > Thanks Matthias for your feedback. I think it makes sense to
> > >> reconsider
> > >> > > the current design based on your input.
> > >> > >
> > >> > > After digging deeper into the current implementation, I'd like to
> > >> bring my
> > >> > > current understanding to be double-checked as it might be
> redefining
> > >> the
> > >> > > KIP's scope:
> > >> > >
> > >> > > 1. There are 2 ranges been exposed by different stores:
> > >> > >
> > >> > > a. Key Range
> > >> > > b. Timestamp Range
> > >> > >
> > >> > > So far, we have discussed covering both.
> > >> > >
> > >> > > 2. Key Range functions do not provide ordering guarantees by
> design:
> > >> > >
> > >> > > ```ReadOnlyKeyValueStore.java
> > >> > >     /**
> > >> > >      * Get an iterator over a given range of keys. This iterator
> must
> > >> be
> > >> > > closed after use.
> > >> > >      * The returned iterator must be safe from {@link
> > >> > > java.util.ConcurrentModificationException}s
> > >> > >      * and must not return null values. No ordering guarantees are
> > >> > > provided.
> > >> > >      * ...
> > >> > >      */
> > >> > >      KeyValueIterator<K, V> range(K from, K to);
> > >> > > ```
> > >> > >
> > >> > > Therefore, I'd propose removing Key range operations from the
> scope.
> > >> > >
> > >> > > 3. Timestamp Range operations happen at the SegmentsStore level
> > >> (internal)
> > >> > > API
> > >> > >
> > >> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
> > >> > >
> > >> > > I'd propose extending `Segments#segments(timeFrom, timeTo,
> backwards)`
> > >> > > with a flag for backwards operations.
> > >> > >
> > >> > > As segments returned will be processed backwards, I'm not
> extending
> > >> > > KeyValueStores to query each segment backwards as previous point
> 2.
> > >> > >
> > >> > > 4. Extend WindowStores implementations with a new
> > >> > > WindowBackwardStore/ReadOnlyBackwardStore:
> > >> > >
> > >> > > ```java
> > >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
> > >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from,
> Instant
> > >> to)
> > >> > > throws IllegalArgumentException;
> > >> > >
> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
> > >> Instant
> > >> > > fromTime, Instant toTime)
> > >> > >         throws IllegalArgumentException;
> > >> > >
> > >> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant
> from,
> > >> > > Instant to) throws IllegalArgumentException;
> > >> > > ```
> > >> > >
> > >> > > 5. SessionStore is a bit different as it has fetch/find sessions
> > >> spread
> > >> > > between SessionStore and ReadOnlySessionStore.
> > >> > >
> > >> > > I'd propose a new interface `SessionBackwardStore` to expose
> backward
> > >> find
> > >> > > operations:
> > >> > >
> > >> > > ```java
> > >> > > public interface SessionBackwardStore<K, AGG> {
> > >> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final
> K
> > >> key,
> > >> > > final long earliestSessionEndTime, final long
> latestSessionStartTime);
> > >> > >
> > >> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final
> K
> > >> > > keyFrom, final K keyTo, final long earliestSessionEndTime, final
> long
> > >> > > latestSessionStartTime);
> > >> > > }
> > >> > > ```
> > >> > >
> > >> > > If this understanding is correct I'll proceed to update the KIP
> based
> > >> on
> > >> > > this.
> > >> > >
> > >> > > Looking forward to your feedback.
> > >> > >
> > >> > > Thanks,
> > >> > > Jorge.
> > >> > >
> > >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <mjsax@apache.org
> >
> > >> wrote:
> > >> > >
> > >> > >> Hey,
> > >> > >>
> > >> > >> Sorry that I am late to the game. I am not 100% convinced about
> the
> > >> > >> current proposal. Using a new config as feature flag seems to be
> > >> rather
> > >> > >> "nasty" to me, and flipping from/to is a little bit too fancy
> for my
> > >> > >> personal taste.
> > >> > >>
> > >> > >> I agree, that the original proposal using a "ReadDirection" enum
> is
> > >> not
> > >> > >> ideal either.
> > >> > >>
> > >> > >> Thus, I would like to put out a new idea: We could add a new
> > >> interface
> > >> > >> that offers new methods that return revers iterators.
> > >> > >>
> > >> > >> The KIP already proposes to add `reverseAll()` and it seems
> backward
> > >> > >> incompatible to just add this method to `ReadOnlyKeyValueStore`
> and
> > >> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
> > >> default
> > >> > >> implementation for custom stores and thus either break
> compatibility
> > >> or
> > >> > >> need add a default that just throws an exception. Neither seems
> to
> > >> be a
> > >> > >> good option.
> > >> > >>
> > >> > >> Using a new interface avoid this issue and allows users
> implementing
> > >> > >> custom stores to opt-in by adding the interface to their stores.
> > >> > >> Furthermore, we don't need any config. In the end, we encapsulte
> the
> > >> > >> change into the store, and our runtime is agnostic to it (as it
> > >> should
> > >> > >> be).
> > >> > >>
> > >> > >> The hierarchy becomes a little complex (but uses would not
> really see
> > >> > >> the complexity):
> > >> > >>
> > >> > >> // exsiting
> > >> > >> ReadOnlyKeyValueStore
> > >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
> > >> > >>
> > >> > >>
> > >> > >> // helper interface; users don't care
> > >> > >> // need similar ones for other stores
> > >> > >> ReverseReadOnlyKeyValueStore {
> > >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
> > >> > >>     KeyValueIterator<K, V> reverseAll();
> > >> > >> }
> > >> > >>
> > >> > >>
> > >> > >> // two new user facing interfaces for kv-store
> > >> > >> // need similar ones for other stores
> > >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
> > >> ReadOnlyKeyValueStore,
> > >> > >> ReverseReadOnlyKeyValueStore
> > >> > >>
> > >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
> > >> > >> ReverseReadOnlyKeyValueStore
> > >> > >>
> > >> > >>
> > >> > >> // updated (also internal)
> > >> > >> // also need to update other built-in stores
> > >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
> > >> BulkLoadingStore
> > >> > >>
> > >> > >>
> > >> > >> In the end, user would only care about the two (for kv-case) new
> > >> > >> interface that offer revers iterator (read only and regular) and
> can
> > >> > >> cast stores accordingly in their Processors/Transformers or via
> IQ.
> > >> > >>
> > >> > >>
> > >> > >> Btw: if we add revers iterator for KeyValue and Window store,
> should
> > >> we
> > >> > >> do the same for Session store?
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> This might be more code to write, but I believe it provides the
> > >> better
> > >> > >> user experience. Thoughts?
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> -Matthias
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >>
> > >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
> > >> > >> > Sorry for my silence, Jorge,
> > >> > >> >
> > >> > >> > I've just taken another look, and I'm personally happy with the
> > >> KIP.
> > >> > >> >
> > >> > >> > Thanks,
> > >> > >> > -John
> > >> > >> >
> > >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya
> wrote:
> > >> > >> >> If no additional comments, I will proceed to start the a vote
> > >> thread.
> > >> > >> >>
> > >> > >> >> Thanks a lot for your feedback!
> > >> > >> >>
> > >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya <
> > >> > >> >> quilcate.jorge@gmail.com> wrote:
> > >> > >> >>
> > >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
> > >> > >> >>>
> > >> > >> >>> I updated the KIP with your feedback.
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>>
> > >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
> > >> > >> sophie@confluent.io>
> > >> > >> >>> wrote:
> > >> > >> >>>
> > >> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in
> the
> > >> case of
> > >> > >> >>>> variable-length keys.
> > >> > >> >>>>
> > >> > >> >>>> In the case of fixed-length keys, you can just compute the
> keys
> > >> that
> > >> > >> >>>> correspond
> > >> > >> >>>> to the maximum and minimum serialized bytes, then perform a
> > >> `range()`
> > >> > >> >>>> query
> > >> > >> >>>> instead of an `all()`. If your keys don't have a
> well-defined
> > >> > >> ordering
> > >> > >> >>>> such
> > >> > >> >>>> that
> > >> > >> >>>> you can't determine the MAX_KEY, then you probably don't
> care
> > >> about
> > >> > >> the
> > >> > >> >>>> iterator order anyway.
> > >> > >> >>>>
> > >> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If all
> your
> > >> > >> keys were
> > >> > >> >>>> just
> > >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the
> only
> > >> way
> > >> > >> to
> > >> > >> >>>> figure out the
> > >> > >> >>>> maximum key in the store is by using `all()` -- and without
> a
> > >> reverse
> > >> > >> >>>> iterator, you're
> > >> > >> >>>> doomed to iterate through every single key just to answer
> that
> > >> simple
> > >> > >> >>>> question.
> > >> > >> >>>>
> > >> > >> >>>> That said, I still think determining the iterator order
> based
> > >> on the
> > >> > >> >>>> to/from bytes
> > >> > >> >>>> makes a lot of intuitive sense and gives the API a nice
> > >> symmetry.
> > >> > >> What if
> > >> > >> >>>> we
> > >> > >> >>>> solved the `all()` problem by just giving `all()` a reverse
> > >> form to
> > >> > >> >>>> complement it?
> > >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something to
> > >> that
> > >> > >> effect.
> > >> > >> >>>>
> > >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate
> Otoya <
> > >> > >> >>>> quilcate.jorge@gmail.com> wrote:
> > >> > >> >>>>
> > >> > >> >>>>> Thanks John.
> > >> > >> >>>>>
> > >> > >> >>>>> Agree. I like the first approach as well, with
> StreamsConfig
> > >> flag
> > >> > >> >>>> passing
> > >> > >> >>>>> by via ProcessorContext.
> > >> > >> >>>>>
> > >> > >> >>>>> Another positive effect with "reverse parameters" is that
> in
> > >> the
> > >> > >> case of
> > >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide
> > >> _which_
> > >> > >> pair
> > >> > >> >>>> to
> > >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
> > >> > >> >>>>>
> > >> > >> >>>>> The only issue I've found while reviewing the KIP is that
> > >> `all()`
> > >> > >> won't
> > >> > >> >>>> fit
> > >> > >> >>>>> within this approach.
> > >> > >> >>>>>
> > >> > >> >>>>> We could remove it from the KIP and argue that for
> WindowStore,
> > >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in
> reverse
> > >> > >> order,
> > >> > >> >>>> and
> > >> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
> > >> > >> >>>>>
> > >> > >> >>>>> If there is consensus with this changes, I will go and
> update
> > >> the
> > >> > >> KIP.
> > >> > >> >>>>>
> > >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
> > >> vvcephei@apache.org>
> > >> > >> >>>> wrote:
> > >> > >> >>>>>
> > >> > >> >>>>>> Hi Jorge,
> > >> > >> >>>>>>
> > >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would
> protect
> > >> anyone
> > >> > >> >>>>>> who may be depending on the current behavior.
> > >> > >> >>>>>>
> > >> > >> >>>>>> It seems better to locate the feature flag in the
> > >> initialization
> > >> > >> >>>> logic of
> > >> > >> >>>>>> the store, rather than have a method on the "live" store
> that
> > >> > >> changes
> > >> > >> >>>>>> its behavior on the fly.
> > >> > >> >>>>>>
> > >> > >> >>>>>> It seems like there are two options here, one is to add a
> new
> > >> > >> config:
> > >> > >> >>>>>>
> > >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
> > >> > >> >>>>>>   "enable.backwards.iteration
> > >> > >> >>>>>>
> > >> > >> >>>>>> Or we can add a feature flag in Materialized, like
> > >> > >> >>>>>>
> > >> > >> >>>>>> Materialized.enableBackwardsIteration()
> > >> > >> >>>>>>
> > >> > >> >>>>>> I think I'd personally lean toward the config, for the
> > >> following
> > >> > >> >>>> reason.
> > >> > >> >>>>>> The concern that Sophie raised is that someone's program
> may
> > >> depend
> > >> > >> >>>>>> on the existing contract of getting an empty iterator. We
> > >> don't
> > >> > >> want
> > >> > >> >>>> to
> > >> > >> >>>>>> switch behavior when they aren't expecting it, so we
> provide
> > >> them a
> > >> > >> >>>>>> config to assert that they _are_ expecting the new
> behavior,
> > >> which
> > >> > >> >>>>>> means they take responsibility for updating their code to
> > >> expect
> > >> > >> the
> > >> > >> >>>> new
> > >> > >> >>>>>> behavior.
> > >> > >> >>>>>>
> > >> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
> > >> behaviors
> > >> > >> on a
> > >> > >> >>>>>> per-query, or per-store basis. We just want people to be
> not
> > >> > >> surprised
> > >> > >> >>>>>> by this change in general.
> > >> > >> >>>>>>
> > >> > >> >>>>>> What do you think?
> > >> > >> >>>>>> Thanks,
> > >> > >> >>>>>> -John
> > >> > >> >>>>>>
> > >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
> > >> > >> >>>>>>> Thank you both for the great feedback.
> > >> > >> >>>>>>>
> > >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the
> need
> > >> for
> > >> > >> >>>>>>> additional API methods. And with a feature flag on
> > >> `StateStore`,
> > >> > >> >>>>>>> disabled by default, should no break current users.
> > >> > >> >>>>>>>
> > >> > >> >>>>>>> The only side-effect I can think of is that: by moving
> the
> > >> flag
> > >> > >> >>>>> upwards,
> > >> > >> >>>>>>> all later operations become affected; which might be ok
> for
> > >> most
> > >> > >> >>>> (all?)
> > >> > >> >>>>>>> cases. I can't think of an scenario where this would be
> an
> > >> issue,
> > >> > >> >>>> just
> > >> > >> >>>>>>> want to point this out.
> > >> > >> >>>>>>>
> > >> > >> >>>>>>> If moving to this approach, I'd like to check if I got
> this
> > >> right
> > >> > >> >>>>> before
> > >> > >> >>>>>>> updating the KIP:
> > >> > >> >>>>>>>
> > >> > >> >>>>>>> - only `StateStore` will change by having a new method:
> > >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep things
> > >> > >> compatible.
> > >> > >> >>>>>>> - then all `*Stores` will have to update their
> implementation
> > >> > >> based
> > >> > >> >>>> on
> > >> > >> >>>>>>> this flag.
> > >> > >> >>>>>>>
> > >> > >> >>>>>>>
> > >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> > >> > >> >>>>>>>>> There's no possibility that someone could be relying
> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> because
> > >> that's
> > >> > >> >>>> not
> > >> > >> >>>>>> what
> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> getting
> > >> an
> > >> > >> >>>> empty
> > >> > >> >>>>>>>> iterator
> > >> > >> >>>>>>>>
> > >> > >> >>>>>>>> I just meant that they might be relying on the
> assumption
> > >> that
> > >> > >> the
> > >> > >> >>>>>> range
> > >> > >> >>>>>>>> query
> > >> > >> >>>>>>>> will never return results with decreasing keys. The
> empty
> > >> > >> iterator
> > >> > >> >>>>>> wouldn't
> > >> > >> >>>>>>>> break that contract, but of course a surprise reverse
> > >> iterator
> > >> > >> >>>> would.
> > >> > >> >>>>>>>>
> > >> > >> >>>>>>>> FWIW I actually am in favor of automatically converting
> to a
> > >> > >> >>>> reverse
> > >> > >> >>>>>>>> iterator,
> > >> > >> >>>>>>>> I just thought we should consider whether this should be
> > >> off by
> > >> > >> >>>>>> default or
> > >> > >> >>>>>>>> even possible to disable at all.
> > >> > >> >>>>>>>>
> > >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
> > >> > >> vvcephei@apache.org
> > >> > >> >>>>>
> > >> > >> >>>>>> wrote:
> > >> > >> >>>>>>>>
> > >> > >> >>>>>>>>> Thanks for the response, Sophie,
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>>>> I wholeheartedly agree we should take as much into
> account
> > >> as
> > >> > >> >>>>> possible
> > >> > >> >>>>>>>>> up front, rather than regretting our decisions later. I
> > >> actually
> > >> > >> >>>> do
> > >> > >> >>>>>> share
> > >> > >> >>>>>>>>> your vague sense of worry, which was what led me to say
> > >> > >> initially
> > >> > >> >>>>>> that I
> > >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
> > >> Sometimes, it's
> > >> > >> >>>>>> better
> > >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
> > >> people
> > >> > >> >>>> will be
> > >> > >> >>>>>>>>> confused
> > >> > >> >>>>>>>>> than not.
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>>>> I really don't think that there's any danger of
> "relying
> > >> on a
> > >> > >> >>>> bug"
> > >> > >> >>>>>> here,
> > >> > >> >>>>>>>>> although
> > >> > >> >>>>>>>>> people certainly could be relying on current behavior.
> One
> > >> thing
> > >> > >> >>>> to
> > >> > >> >>>>> be
> > >> > >> >>>>>>>>> clear
> > >> > >> >>>>>>>>> about (which I just left a more detailed comment in
> > >> KAFKA-8159
> > >> > >> >>>>> about)
> > >> > >> >>>>>> is
> > >> > >> >>>>>>>>> that
> > >> > >> >>>>>>>>> when we say something like key1 > key2, this ordering
> is
> > >> defined
> > >> > >> >>>> by
> > >> > >> >>>>>> the
> > >> > >> >>>>>>>>> serde's output and nothing else.
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>>>> Currently, thanks to your fix in
> > >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
> > >> > >> >>>>>>>>> ,
> > >> > >> >>>>>>>>> the store contract is that for range scans, if from >
> to,
> > >> then
> > >> > >> >>>> the
> > >> > >> >>>>>> store
> > >> > >> >>>>>>>>> must
> > >> > >> >>>>>>>>> return an empty iterator. There's no possibility that
> > >> someone
> > >> > >> >>>> could
> > >> > >> >>>>> be
> > >> > >> >>>>>>>>> relying
> > >> > >> >>>>>>>>> on iterating over that range in increasing order,
> because
> > >> that's
> > >> > >> >>>> not
> > >> > >> >>>>>> what
> > >> > >> >>>>>>>>> happens. However, they could indeed be relying on
> getting
> > >> an
> > >> > >> >>>> empty
> > >> > >> >>>>>>>>> iterator.
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>>>> My counterproposal was to actually change this
> contract to
> > >> say
> > >> > >> >>>> that
> > >> > >> >>>>>> the
> > >> > >> >>>>>>>>> store
> > >> > >> >>>>>>>>> must return an iterator over the keys in that range,
> but
> > >> in the
> > >> > >> >>>>>> reverse
> > >> > >> >>>>>>>>> order.
> > >> > >> >>>>>>>>> So, in addition to considering whether this idea is
> "too
> > >> fancy"
> > >> > >> >>>> (aka
> > >> > >> >>>>>>>>> confusing),
> > >> > >> >>>>>>>>> we should also consider the likelihood of breaking an
> > >> existing
> > >> > >> >>>>>> program with
> > >> > >> >>>>>>>>> this behavior/contract change.
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
> > >> strongly in
> > >> > >> >>>>> favor
> > >> > >> >>>>>> of my
> > >> > >> >>>>>>>>> proposal. I just wanted to present it for consideration
> > >> > >> alongside
> > >> > >> >>>>>> Jorge's
> > >> > >> >>>>>>>>> original one.
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>>>> Thanks for raising these very good points,
> > >> > >> >>>>>>>>> -John
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman
> wrote:
> > >> > >> >>>>>>>>>>> Rather than working around it, I think we should just
> > >> fix it
> > >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
> > >> > >> >>>>>>>>>>
> > >> > >> >>>>>>>>>> That was my primary concern, although I do have a
> vague
> > >> sense
> > >> > >> of
> > >> > >> >>>>>> worry
> > >> > >> >>>>>>>>>> that we might be allowing users to get into trouble
> > >> without
> > >> > >> >>>>>> realizing it.
> > >> > >> >>>>>>>>>> For example if their custom serdes suffer a similar
> bug
> > >> as the
> > >> > >> >>>>> above,
> > >> > >> >>>>>>>>>> and/or
> > >> > >> >>>>>>>>>> they rely on getting results in increasing order (of
> the
> > >> keys)
> > >> > >> >>>> even
> > >> > >> >>>>>> when
> > >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the
> > >> range
> > >> > >> >>>> query
> > >> > >> >>>>>> returns
> > >> > >> >>>>>>>>>> nothing in that case.
> > >> > >> >>>>>>>>>>
> > >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or not,
> > >> but in
> > >> > >> >>>> that
> > >> > >> >>>>>> past
> > >> > >> >>>>>>>>>> we've
> > >> > >> >>>>>>>>>> taken the stance that we should not break
> compatibility
> > >> even if
> > >> > >> >>>> the
> > >> > >> >>>>>> user
> > >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
> > >> > >> >>>>>>>>>>
> > >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against
> this
> > >> > >> >>>> proposal,
> > >> > >> >>>>>> just
> > >> > >> >>>>>>>>>> laying
> > >> > >> >>>>>>>>>> out some considerations we should take into account.
> At
> > >> the end
> > >> > >> >>>> of
> > >> > >> >>>>>> the
> > >> > >> >>>>>>>>> day
> > >> > >> >>>>>>>>>> we should do what's right rather than maintain
> > >> compatibility
> > >> > >> >>>> with
> > >> > >> >>>>>>>>> existing
> > >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle
> ground.
> > >> > >> >>>>>>>>>>
> > >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
> > >> > >> >>>> vvcephei@apache.org>
> > >> > >> >>>>>>>>> wrote:
> > >> > >> >>>>>>>>>>> Thanks Sophie,
> > >> > >> >>>>>>>>>>>
> > >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around
> it,
> > >> I
> > >> > >> >>>> think
> > >> > >> >>>>> we
> > >> > >> >>>>>>>>> should
> > >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
> > >> > >> >>>>>>>>>>>
> > >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern
> > >> that some
> > >> > >> >>>>>> serdes
> > >> > >> >>>>>>>>>>> might be incorrectly written.
> > >> > >> >>>>>>>>>>>
> > >> > >> >>>>>>>>>>> Were there other practical concerns that you had in
> mind?
> > >> > >> >>>>>>>>>>>
> > >> > >> >>>>>>>>>>> Thanks,
> > >> > >> >>>>>>>>>>> John
> > >> > >> >>>>>>>>>>>
> > >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
> > >> wrote:
> > >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from
> bytes
> > >> but I
> > >> > >> >>>>> think
> > >> > >> >>>>>>>>> there
> > >> > >> >>>>>>>>>>>> are some practical limitations to implementing
> this. In
> > >> > >> >>>>> particular
> > >> > >> >>>>>>>>>>>> I'm thinking about this issue
> > >> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159>
> > >> with the
> > >> > >> >>>>>> built-in
> > >> > >> >>>>>>>>>>> signed
> > >> > >> >>>>>>>>>>>> number serdes.
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
> > >> > >> >>>> negative-negative
> > >> > >> >>>>>>>>> queries
> > >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
> > >> > >> undetectable
> > >> > >> >>>>>>>>>>>> incorrect results for negative-positive queries. For
> > >> example,
> > >> > >> >>>> say
> > >> > >> >>>>>> you
> > >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the
> Short
> > >> > >> serdes.
> > >> > >> >>>>> The
> > >> > >> >>>>>>>>>>>> serialized bytes for that are
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> from = 1111111111111111
> > >> > >> >>>>>>>>>>>> to = 0000000000000001
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> so we would end up flipping those and iterating over
> > >> all keys
> > >> > >> >>>>> from
> > >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
> > >> > >> >>>>> lexicographical
> > >> > >> >>>>>>>>>>>> order means we would iterate over every key in the
> space
> > >> > >> >>>> *except*
> > >> > >> >>>>>> for
> > >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant
> to be
> > >> > >> >>>> included
> > >> > >> >>>>>> in
> > >> > >> >>>>>>>>> the
> > >> > >> >>>>>>>>>>>> range query.
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> Currently we just log a warning and return an empty
> > >> iterator
> > >> > >> >>>> when
> > >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but
> feels
> > >> > >> >>>> slightly
> > >> > >> >>>>>> more
> > >> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
> > >> reverse
> > >> > >> >>>>> queries
> > >> > >> >>>>>> we
> > >> > >> >>>>>>>>>>>> can't even log a warning in this case unless we
> wanted
> > >> to log
> > >> > >> >>>> a
> > >> > >> >>>>>>>>> warning
> > >> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
> > >> usage of a
> > >> > >> >>>> new
> > >> > >> >>>>>>>>>>>> feature.
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off
> the
> > >> top of
> > >> > >> >>>> my
> > >> > >> >>>>>> head
> > >> > >> >>>>>>>>> I
> > >> > >> >>>>>>>>>>> guess
> > >> > >> >>>>>>>>>>>> we could add a store config to enable/disable
> automatic
> > >> > >> >>>> reverse
> > >> > >> >>>>>>>>>>> iteration,
> > >> > >> >>>>>>>>>>>> which is off by default?
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> Sophie
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
> > >> > >> >>>>> vvcephei@apache.org>
> > >> > >> >>>>>>>>>>> wrote:
> > >> > >> >>>>>>>>>>>>> Hi there Jorge,
> > >> > >> >>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>> Thanks for the KIP!
> > >> > >> >>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
> > >> > >> >>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what
> do
> > >> you
> > >> > >> >>>> think
> > >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing people
> to
> > >> flip
> > >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from
> "A"
> > >> to "Z"
> > >> > >> >>>>> would
> > >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
> > >> backward
> > >> > >> >>>> one?
> > >> > >> >>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>> Thanks,
> > >> > >> >>>>>>>>>>>>> -John
> > >> > >> >>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate
> wrote:
> > >> > >> >>>>>>>>>>>>>> Hi everyone,
> > >> > >> >>>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617:
> > >> > >> >>>>>>>>>>>>>>
> > >> > >> >>>>>>>>>
> > >> > >> >>>>>>
> > >> > >> >>>>>
> > >> > >> >>>>
> > >> > >>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> > >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
> > >> > >> >>>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>>> Thanks!
> > >> > >> >>>>>>>>>>>>>> Jorge.
> > >> > >> >>>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>>>
> > >> > >> >>>>>>>>>>>>>> Attachments:
> > >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> > >> > >> >>>>>>>
> > >> > >> >>>>>>>
> > >> > >> >>>>>>> Attachments:
> > >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
> > >> > >> >>>>>>
> > >> > >> >>>>>
> > >> > >> >>>>
> > >> > >> >>>
> > >> > >> >>
> > >> > >>
> > >> > >>
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-617: Allow Kafka Streams State Stores to be iterated backwards

Posted by John Roesler <vv...@apache.org>.
Hi Jorge,

Thanks for the update. I think this is a good plan. 

I just took a look at the KIP again, and it looks like there’s a revision error in which two methods are proposed for SessionStore, but seem like they should be in ReadOnlySessionStore. Do I read that right?

Otherwise, I’m happy with your proposal. 

Thanks,
John

On Wed, Jul 1, 2020, at 17:01, Jorge Esteban Quilcate Otoya wrote:
> Quick update: KIP is updated with latest changes now.
> Will leave it open this week while working on the PR.
> 
> Hope to open a new vote thread over the next few days if no additional
> feedback is provided.
> 
> Cheers,
> Jorge.
> 
> On Mon, Jun 29, 2020 at 11:30 AM Jorge Esteban Quilcate Otoya <
> quilcate.jorge@gmail.com> wrote:
> 
> > Thanks, John!
> >
> > Make sense to reconsider the current approach. I was heading in a similar
> > direction while drafting the implementation. Metered, Caching, and other
> > layers will also have to get duplicated to build up new methods in `Stores`
> > factory, and class casting issues would appear on stores created by DSL.
> >
> > I will draft a proposal with new methods (move methods from proposed
> > interfaces to existing ones) with default implementation in a KIP update
> > and wait for Matthias to chime in to validate this approach.
> >
> > Jorge.
> >
> >
> > On Sat, Jun 27, 2020 at 4:01 PM John Roesler <vv...@apache.org> wrote:
> >
> >> Hi Jorge,
> >>
> >> Sorry for my silence, I've been absorbed with the 2.6 and 2.5.1 releases.
> >>
> >> The idea to separate the new methods into "mixin" interfaces seems
> >> like a good one, but as we've discovered in KIP-614, it doesn't work
> >> out that way in practice. The problem is that the store implementations
> >> are just the base layer that get composed with other layers in Streams
> >> before they can be accessed in the DSL. This is extremely subtle, so
> >> I'm going to put everyone to sleep with a detailed explanation:
> >>
> >> For example, this is the mechanism by which all KeyValueStore
> >> implementations get added to Streams:
> >> org.apache.kafka.streams.state.internals.KeyValueStoreBuilder#build
> >> return new MeteredKeyValueStore<>(
> >>   maybeWrapCaching(maybeWrapLogging(storeSupplier.get())),
> >>   storeSupplier.metricsScope(),
> >>   time,
> >>   keySerde,
> >>   valueSerde
> >> );
> >>
> >> In the DSL, the store that a processor gets from the context would be
> >> the result of this composition. So even if the storeSupplier.get() returns
> >> a store that implements the "reverse" interface, when you try to use it
> >> from a processor like:
> >> org.apache.kafka.streams.kstream.ValueTransformerWithKey#init
> >> ReadOnlyBackwardWindowStore<K, V> store =
> >>   (ReadOnlyBackwardWindowStore<K, V>) context.getStateStore(..)
> >>
> >> You'd just get a ClassCastException because it's actually a
> >> MeteredKeyValueStore, which doesn't implement
> >> ReadOnlyBackwardWindowStore.
> >>
> >> The only way to make this work would be to make the Metered,
> >> Caching, and Logging layers also implement the new interfaces,
> >> but this effectively forces implementations to also implement
> >> the interface. Otherwise, the intermediate layers would have to
> >> cast the store in each method, like this:
> >> MeteredWindowStore#backwardFetch {
> >>   ((ReadOnlyBackwardWindowStore<K, V>) innerStore).backwardFetch(..)
> >> }
> >>
> >> And then if the implementation doesn't "opt in" by implementing
> >> the interface, you'd get a ClassCastException, not when you get the
> >> store, but when you try to use it.
> >>
> >> The fact that we get ClassCastExceptions no matter which way we
> >> turn here indicates that we're really not getting any benefit from the
> >> type system, which makes the extra interfaces seem not worth all the
> >> code involved.
> >>
> >> Where we landed in KIP-614 is that, unless we want to completely
> >> revamp the way that StateStores work in the DSL, you might as
> >> well just add the new methods to the existing interfaces. To prevent
> >> compilation errors, we can add default implementations that throw
> >> UnsupportedOperationException. If a store doesn't opt in by
> >> implementing the methods, you'd get an UnsupportedOperationException,
> >> which seems no worse, and maybe better, than the ClassCastException
> >> you'd get if we go with the "mixin interface" approach.
> >>
> >> A quick note: This entire discussion focuses on the DSL. If you're just
> >> using the Processor API by directly adding the a custom store to the
> >> Topology:
> >> org.apache.kafka.streams.Topology#addStateStore
> >> and then retrieving it in the processor via:
> >> org.apache.kafka.streams.processor.ProcessorContext#getStateStore
> >> in
> >> org.apache.kafka.streams.processor.Processor#init
> >>
> >> Then, you can both register and retrieve _any_ StateStore implementation.
> >> There's no need to use KeyValueStore or any other built-in interface.
> >> In other words, KeyValueStore and company are only part of the DSL,
> >> not the PAPI. So, discussions about the build-in store interfaces are only
> >> really relevant in the context of the DSL, Transformers, and Materialized.
> >>
> >> So, in conclusion, I'd really recommend just adding any new methods to
> >> the existing store interfaces. We might be able to revamp the API in the
> >> future to support mixins, but it's a much larger scope of work than this
> >> KIP.
> >> A more minor comment is that we don't need to add Deprecated variants
> >> of new methods.
> >>
> >> Thanks again, and once again, I'm sorry I tuned out and didn't offer this
> >> feedback before you revised the KIP.
> >> -John
> >>
> >>
> >>
> >>
> >> On Mon, Jun 22, 2020, at 06:11, Jorge Esteban Quilcate Otoya wrote:
> >> > Hi everyone,
> >> >
> >> > I've updated the KIP, applying Matthias' feedback regarding interface
> >> > hierarchy.
> >> >
> >> > Also, following the last email, I think we can consider reverse
> >> operations
> >> > on KeyValue range as well, as implementation supports lexicographic
> >> order.
> >> >
> >> > I considered different naming between Key-based ranges and Time-based
> >> > ranges, and mitigate confusion when fetching keys and time ranges as
> >> > WindowStore does:
> >> >
> >> > Key-based ranges: reverseRange(), reverseAll()
> >> > Time-based ranges: backwardFetch()
> >> >
> >> > Then, key-based changes apply to KeyValueStore, and time-based changes
> >> to
> >> > Window and Session stores.
> >> >
> >> > Let me know if you have any questions.
> >> >
> >> > Thanks,
> >> > Jorge.
> >> >
> >> >
> >> > On Tue, Jun 16, 2020 at 12:47 AM Jorge Esteban Quilcate Otoya <
> >> > quilcate.jorge@gmail.com> wrote:
> >> >
> >> > > Hi everyone, sorry for the late reply.
> >> > >
> >> > > Thanks Matthias for your feedback. I think it makes sense to
> >> reconsider
> >> > > the current design based on your input.
> >> > >
> >> > > After digging deeper into the current implementation, I'd like to
> >> bring my
> >> > > current understanding to be double-checked as it might be redefining
> >> the
> >> > > KIP's scope:
> >> > >
> >> > > 1. There are 2 ranges been exposed by different stores:
> >> > >
> >> > > a. Key Range
> >> > > b. Timestamp Range
> >> > >
> >> > > So far, we have discussed covering both.
> >> > >
> >> > > 2. Key Range functions do not provide ordering guarantees by design:
> >> > >
> >> > > ```ReadOnlyKeyValueStore.java
> >> > >     /**
> >> > >      * Get an iterator over a given range of keys. This iterator must
> >> be
> >> > > closed after use.
> >> > >      * The returned iterator must be safe from {@link
> >> > > java.util.ConcurrentModificationException}s
> >> > >      * and must not return null values. No ordering guarantees are
> >> > > provided.
> >> > >      * ...
> >> > >      */
> >> > >      KeyValueIterator<K, V> range(K from, K to);
> >> > > ```
> >> > >
> >> > > Therefore, I'd propose removing Key range operations from the scope.
> >> > >
> >> > > 3. Timestamp Range operations happen at the SegmentsStore level
> >> (internal)
> >> > > API
> >> > >
> >> > > AFAICT, Segments wrappers handle all Timestamp ranges queries.
> >> > >
> >> > > I'd propose extending `Segments#segments(timeFrom, timeTo, backwards)`
> >> > > with a flag for backwards operations.
> >> > >
> >> > > As segments returned will be processed backwards, I'm not extending
> >> > > KeyValueStores to query each segment backwards as previous point 2.
> >> > >
> >> > > 4. Extend WindowStores implementations with a new
> >> > > WindowBackwardStore/ReadOnlyBackwardStore:
> >> > >
> >> > > ```java
> >> > > public interface ReadOnlyWindowBackwardStore<K, V> {
> >> > >     WindowStoreIterator<V> backwardFetch(K key, Instant from, Instant
> >> to)
> >> > > throws IllegalArgumentException;
> >> > >
> >> > >     KeyValueIterator<Windowed<K>, V> backwardFetch(K from, K to,
> >> Instant
> >> > > fromTime, Instant toTime)
> >> > >         throws IllegalArgumentException;
> >> > >
> >> > >     KeyValueIterator<Windowed<K>, V> backwardFetchAll(Instant from,
> >> > > Instant to) throws IllegalArgumentException;
> >> > > ```
> >> > >
> >> > > 5. SessionStore is a bit different as it has fetch/find sessions
> >> spread
> >> > > between SessionStore and ReadOnlySessionStore.
> >> > >
> >> > > I'd propose a new interface `SessionBackwardStore` to expose backward
> >> find
> >> > > operations:
> >> > >
> >> > > ```java
> >> > > public interface SessionBackwardStore<K, AGG> {
> >> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
> >> key,
> >> > > final long earliestSessionEndTime, final long latestSessionStartTime);
> >> > >
> >> > >     KeyValueIterator<Windowed<K>, AGG> backwardFindSessions(final K
> >> > > keyFrom, final K keyTo, final long earliestSessionEndTime, final long
> >> > > latestSessionStartTime);
> >> > > }
> >> > > ```
> >> > >
> >> > > If this understanding is correct I'll proceed to update the KIP based
> >> on
> >> > > this.
> >> > >
> >> > > Looking forward to your feedback.
> >> > >
> >> > > Thanks,
> >> > > Jorge.
> >> > >
> >> > > On Fri, May 29, 2020 at 3:32 AM Matthias J. Sax <mj...@apache.org>
> >> wrote:
> >> > >
> >> > >> Hey,
> >> > >>
> >> > >> Sorry that I am late to the game. I am not 100% convinced about the
> >> > >> current proposal. Using a new config as feature flag seems to be
> >> rather
> >> > >> "nasty" to me, and flipping from/to is a little bit too fancy for my
> >> > >> personal taste.
> >> > >>
> >> > >> I agree, that the original proposal using a "ReadDirection" enum is
> >> not
> >> > >> ideal either.
> >> > >>
> >> > >> Thus, I would like to put out a new idea: We could add a new
> >> interface
> >> > >> that offers new methods that return revers iterators.
> >> > >>
> >> > >> The KIP already proposes to add `reverseAll()` and it seems backward
> >> > >> incompatible to just add this method to `ReadOnlyKeyValueStore` and
> >> > >> `ReadOnlyWindowStore`. I don't think we could provide a useful
> >> default
> >> > >> implementation for custom stores and thus either break compatibility
> >> or
> >> > >> need add a default that just throws an exception. Neither seems to
> >> be a
> >> > >> good option.
> >> > >>
> >> > >> Using a new interface avoid this issue and allows users implementing
> >> > >> custom stores to opt-in by adding the interface to their stores.
> >> > >> Furthermore, we don't need any config. In the end, we encapsulte the
> >> > >> change into the store, and our runtime is agnostic to it (as it
> >> should
> >> > >> be).
> >> > >>
> >> > >> The hierarchy becomes a little complex (but uses would not really see
> >> > >> the complexity):
> >> > >>
> >> > >> // exsiting
> >> > >> ReadOnlyKeyValueStore
> >> > >> KeyValueStore extend StateStore, ReadOnlyKeyValueStore
> >> > >>
> >> > >>
> >> > >> // helper interface; users don't care
> >> > >> // need similar ones for other stores
> >> > >> ReverseReadOnlyKeyValueStore {
> >> > >>     KeyValueIterator<K, V> reverseRange(K from, K to);
> >> > >>     KeyValueIterator<K, V> reverseAll();
> >> > >> }
> >> > >>
> >> > >>
> >> > >> // two new user facing interfaces for kv-store
> >> > >> // need similar ones for other stores
> >> > >> ReadOnlyKeyValueStoreWithReverseIterators extends
> >> ReadOnlyKeyValueStore,
> >> > >> ReverseReadOnlyKeyValueStore
> >> > >>
> >> > >> KeyValueStoreWithReverseIterators extends KeyValueStore,
> >> > >> ReverseReadOnlyKeyValueStore
> >> > >>
> >> > >>
> >> > >> // updated (also internal)
> >> > >> // also need to update other built-in stores
> >> > >> RocksDB implements KeyValueStoreWithReverseIterators,
> >> BulkLoadingStore
> >> > >>
> >> > >>
> >> > >> In the end, user would only care about the two (for kv-case) new
> >> > >> interface that offer revers iterator (read only and regular) and can
> >> > >> cast stores accordingly in their Processors/Transformers or via IQ.
> >> > >>
> >> > >>
> >> > >> Btw: if we add revers iterator for KeyValue and Window store, should
> >> we
> >> > >> do the same for Session store?
> >> > >>
> >> > >>
> >> > >>
> >> > >> This might be more code to write, but I believe it provides the
> >> better
> >> > >> user experience. Thoughts?
> >> > >>
> >> > >>
> >> > >>
> >> > >> -Matthias
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On 5/26/20 8:47 PM, John Roesler wrote:
> >> > >> > Sorry for my silence, Jorge,
> >> > >> >
> >> > >> > I've just taken another look, and I'm personally happy with the
> >> KIP.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > -John
> >> > >> >
> >> > >> > On Tue, May 26, 2020, at 16:17, Jorge Esteban Quilcate Otoya wrote:
> >> > >> >> If no additional comments, I will proceed to start the a vote
> >> thread.
> >> > >> >>
> >> > >> >> Thanks a lot for your feedback!
> >> > >> >>
> >> > >> >> On Fri, May 22, 2020 at 9:25 AM Jorge Esteban Quilcate Otoya <
> >> > >> >> quilcate.jorge@gmail.com> wrote:
> >> > >> >>
> >> > >> >>> Thanks Sophie. I like the `reverseAll()` idea.
> >> > >> >>>
> >> > >> >>> I updated the KIP with your feedback.
> >> > >> >>>
> >> > >> >>>
> >> > >> >>>
> >> > >> >>> On Fri, May 22, 2020 at 4:22 AM Sophie Blee-Goldman <
> >> > >> sophie@confluent.io>
> >> > >> >>> wrote:
> >> > >> >>>
> >> > >> >>>> Hm, the case of `all()` does seem to present a dilemma in the
> >> case of
> >> > >> >>>> variable-length keys.
> >> > >> >>>>
> >> > >> >>>> In the case of fixed-length keys, you can just compute the keys
> >> that
> >> > >> >>>> correspond
> >> > >> >>>> to the maximum and minimum serialized bytes, then perform a
> >> `range()`
> >> > >> >>>> query
> >> > >> >>>> instead of an `all()`. If your keys don't have a well-defined
> >> > >> ordering
> >> > >> >>>> such
> >> > >> >>>> that
> >> > >> >>>> you can't determine the MAX_KEY, then you probably don't care
> >> about
> >> > >> the
> >> > >> >>>> iterator order anyway.
> >> > >> >>>>
> >> > >> >>>>  But with variable-length keys, there is no MAX_KEY. If all your
> >> > >> keys were
> >> > >> >>>> just
> >> > >> >>>> of the form 'a', 'aa', 'aaaaa', 'aaaaaaa' then in fact the only
> >> way
> >> > >> to
> >> > >> >>>> figure out the
> >> > >> >>>> maximum key in the store is by using `all()` -- and without a
> >> reverse
> >> > >> >>>> iterator, you're
> >> > >> >>>> doomed to iterate through every single key just to answer that
> >> simple
> >> > >> >>>> question.
> >> > >> >>>>
> >> > >> >>>> That said, I still think determining the iterator order based
> >> on the
> >> > >> >>>> to/from bytes
> >> > >> >>>> makes a lot of intuitive sense and gives the API a nice
> >> symmetry.
> >> > >> What if
> >> > >> >>>> we
> >> > >> >>>> solved the `all()` problem by just giving `all()` a reverse
> >> form to
> >> > >> >>>> complement it?
> >> > >> >>>> Ie we would have `all()` and `reverseAll()`, or something to
> >> that
> >> > >> effect.
> >> > >> >>>>
> >> > >> >>>> On Thu, May 21, 2020 at 3:41 PM Jorge Esteban Quilcate Otoya <
> >> > >> >>>> quilcate.jorge@gmail.com> wrote:
> >> > >> >>>>
> >> > >> >>>>> Thanks John.
> >> > >> >>>>>
> >> > >> >>>>> Agree. I like the first approach as well, with StreamsConfig
> >> flag
> >> > >> >>>> passing
> >> > >> >>>>> by via ProcessorContext.
> >> > >> >>>>>
> >> > >> >>>>> Another positive effect with "reverse parameters" is that in
> >> the
> >> > >> case of
> >> > >> >>>>> `fetch(keyFrom, keyTo, timeFrom, timeTo)` users can decide
> >> _which_
> >> > >> pair
> >> > >> >>>> to
> >> > >> >>>>> flip, whether with `ReadDirection` enum it apply to both.
> >> > >> >>>>>
> >> > >> >>>>> The only issue I've found while reviewing the KIP is that
> >> `all()`
> >> > >> won't
> >> > >> >>>> fit
> >> > >> >>>>> within this approach.
> >> > >> >>>>>
> >> > >> >>>>> We could remove it from the KIP and argue that for WindowStore,
> >> > >> >>>>> `fetchAll(0, Long.MAX_VALUE)` can be used to get all in reverse
> >> > >> order,
> >> > >> >>>> and
> >> > >> >>>>> for KeyValueStore, no ordering guarantees are provided.
> >> > >> >>>>>
> >> > >> >>>>> If there is consensus with this changes, I will go and update
> >> the
> >> > >> KIP.
> >> > >> >>>>>
> >> > >> >>>>> On Thu, May 21, 2020 at 3:33 PM John Roesler <
> >> vvcephei@apache.org>
> >> > >> >>>> wrote:
> >> > >> >>>>>
> >> > >> >>>>>> Hi Jorge,
> >> > >> >>>>>>
> >> > >> >>>>>> Thanks for that idea. I agree, a feature flag would protect
> >> anyone
> >> > >> >>>>>> who may be depending on the current behavior.
> >> > >> >>>>>>
> >> > >> >>>>>> It seems better to locate the feature flag in the
> >> initialization
> >> > >> >>>> logic of
> >> > >> >>>>>> the store, rather than have a method on the "live" store that
> >> > >> changes
> >> > >> >>>>>> its behavior on the fly.
> >> > >> >>>>>>
> >> > >> >>>>>> It seems like there are two options here, one is to add a new
> >> > >> config:
> >> > >> >>>>>>
> >> > >> >>>>>> StreamsConfig.ENABLE_BACKWARDS_ITERATION =
> >> > >> >>>>>>   "enable.backwards.iteration
> >> > >> >>>>>>
> >> > >> >>>>>> Or we can add a feature flag in Materialized, like
> >> > >> >>>>>>
> >> > >> >>>>>> Materialized.enableBackwardsIteration()
> >> > >> >>>>>>
> >> > >> >>>>>> I think I'd personally lean toward the config, for the
> >> following
> >> > >> >>>> reason.
> >> > >> >>>>>> The concern that Sophie raised is that someone's program may
> >> depend
> >> > >> >>>>>> on the existing contract of getting an empty iterator. We
> >> don't
> >> > >> want
> >> > >> >>>> to
> >> > >> >>>>>> switch behavior when they aren't expecting it, so we provide
> >> them a
> >> > >> >>>>>> config to assert that they _are_ expecting the new behavior,
> >> which
> >> > >> >>>>>> means they take responsibility for updating their code to
> >> expect
> >> > >> the
> >> > >> >>>> new
> >> > >> >>>>>> behavior.
> >> > >> >>>>>>
> >> > >> >>>>>> There doesn't seem to be a reason to offer a choice of
> >> behaviors
> >> > >> on a
> >> > >> >>>>>> per-query, or per-store basis. We just want people to be not
> >> > >> surprised
> >> > >> >>>>>> by this change in general.
> >> > >> >>>>>>
> >> > >> >>>>>> What do you think?
> >> > >> >>>>>> Thanks,
> >> > >> >>>>>> -John
> >> > >> >>>>>>
> >> > >> >>>>>> On Wed, May 20, 2020, at 17:37, Jorge Quilcate wrote:
> >> > >> >>>>>>> Thank you both for the great feedback.
> >> > >> >>>>>>>
> >> > >> >>>>>>> I like the "fancy" proposal :), and how it removes the need
> >> for
> >> > >> >>>>>>> additional API methods. And with a feature flag on
> >> `StateStore`,
> >> > >> >>>>>>> disabled by default, should no break current users.
> >> > >> >>>>>>>
> >> > >> >>>>>>> The only side-effect I can think of is that: by moving the
> >> flag
> >> > >> >>>>> upwards,
> >> > >> >>>>>>> all later operations become affected; which might be ok for
> >> most
> >> > >> >>>> (all?)
> >> > >> >>>>>>> cases. I can't think of an scenario where this would be an
> >> issue,
> >> > >> >>>> just
> >> > >> >>>>>>> want to point this out.
> >> > >> >>>>>>>
> >> > >> >>>>>>> If moving to this approach, I'd like to check if I got this
> >> right
> >> > >> >>>>> before
> >> > >> >>>>>>> updating the KIP:
> >> > >> >>>>>>>
> >> > >> >>>>>>> - only `StateStore` will change by having a new method:
> >> > >> >>>>>>> `backwardIteration()`, `false` by default to keep things
> >> > >> compatible.
> >> > >> >>>>>>> - then all `*Stores` will have to update their implementation
> >> > >> based
> >> > >> >>>> on
> >> > >> >>>>>>> this flag.
> >> > >> >>>>>>>
> >> > >> >>>>>>>
> >> > >> >>>>>>> On 20/05/2020 21:02, Sophie Blee-Goldman wrote:
> >> > >> >>>>>>>>> There's no possibility that someone could be relying
> >> > >> >>>>>>>>> on iterating over that range in increasing order, because
> >> that's
> >> > >> >>>> not
> >> > >> >>>>>> what
> >> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
> >> an
> >> > >> >>>> empty
> >> > >> >>>>>>>> iterator
> >> > >> >>>>>>>>
> >> > >> >>>>>>>> I just meant that they might be relying on the assumption
> >> that
> >> > >> the
> >> > >> >>>>>> range
> >> > >> >>>>>>>> query
> >> > >> >>>>>>>> will never return results with decreasing keys. The empty
> >> > >> iterator
> >> > >> >>>>>> wouldn't
> >> > >> >>>>>>>> break that contract, but of course a surprise reverse
> >> iterator
> >> > >> >>>> would.
> >> > >> >>>>>>>>
> >> > >> >>>>>>>> FWIW I actually am in favor of automatically converting to a
> >> > >> >>>> reverse
> >> > >> >>>>>>>> iterator,
> >> > >> >>>>>>>> I just thought we should consider whether this should be
> >> off by
> >> > >> >>>>>> default or
> >> > >> >>>>>>>> even possible to disable at all.
> >> > >> >>>>>>>>
> >> > >> >>>>>>>> On Tue, May 19, 2020 at 7:42 PM John Roesler <
> >> > >> vvcephei@apache.org
> >> > >> >>>>>
> >> > >> >>>>>> wrote:
> >> > >> >>>>>>>>
> >> > >> >>>>>>>>> Thanks for the response, Sophie,
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> I wholeheartedly agree we should take as much into account
> >> as
> >> > >> >>>>> possible
> >> > >> >>>>>>>>> up front, rather than regretting our decisions later. I
> >> actually
> >> > >> >>>> do
> >> > >> >>>>>> share
> >> > >> >>>>>>>>> your vague sense of worry, which was what led me to say
> >> > >> initially
> >> > >> >>>>>> that I
> >> > >> >>>>>>>>> thought my counterproposal might be "too fancy".
> >> Sometimes, it's
> >> > >> >>>>>> better
> >> > >> >>>>>>>>> to be explicit instead of "elegant", if we think more
> >> people
> >> > >> >>>> will be
> >> > >> >>>>>>>>> confused
> >> > >> >>>>>>>>> than not.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> I really don't think that there's any danger of "relying
> >> on a
> >> > >> >>>> bug"
> >> > >> >>>>>> here,
> >> > >> >>>>>>>>> although
> >> > >> >>>>>>>>> people certainly could be relying on current behavior. One
> >> thing
> >> > >> >>>> to
> >> > >> >>>>> be
> >> > >> >>>>>>>>> clear
> >> > >> >>>>>>>>> about (which I just left a more detailed comment in
> >> KAFKA-8159
> >> > >> >>>>> about)
> >> > >> >>>>>> is
> >> > >> >>>>>>>>> that
> >> > >> >>>>>>>>> when we say something like key1 > key2, this ordering is
> >> defined
> >> > >> >>>> by
> >> > >> >>>>>> the
> >> > >> >>>>>>>>> serde's output and nothing else.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> Currently, thanks to your fix in
> >> > >> >>>>>> https://github.com/apache/kafka/pull/6521
> >> > >> >>>>>>>>> ,
> >> > >> >>>>>>>>> the store contract is that for range scans, if from > to,
> >> then
> >> > >> >>>> the
> >> > >> >>>>>> store
> >> > >> >>>>>>>>> must
> >> > >> >>>>>>>>> return an empty iterator. There's no possibility that
> >> someone
> >> > >> >>>> could
> >> > >> >>>>> be
> >> > >> >>>>>>>>> relying
> >> > >> >>>>>>>>> on iterating over that range in increasing order, because
> >> that's
> >> > >> >>>> not
> >> > >> >>>>>> what
> >> > >> >>>>>>>>> happens. However, they could indeed be relying on getting
> >> an
> >> > >> >>>> empty
> >> > >> >>>>>>>>> iterator.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> My counterproposal was to actually change this contract to
> >> say
> >> > >> >>>> that
> >> > >> >>>>>> the
> >> > >> >>>>>>>>> store
> >> > >> >>>>>>>>> must return an iterator over the keys in that range, but
> >> in the
> >> > >> >>>>>> reverse
> >> > >> >>>>>>>>> order.
> >> > >> >>>>>>>>> So, in addition to considering whether this idea is "too
> >> fancy"
> >> > >> >>>> (aka
> >> > >> >>>>>>>>> confusing),
> >> > >> >>>>>>>>> we should also consider the likelihood of breaking an
> >> existing
> >> > >> >>>>>> program with
> >> > >> >>>>>>>>> this behavior/contract change.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> To echo your clarification, I'm also not advocating
> >> strongly in
> >> > >> >>>>> favor
> >> > >> >>>>>> of my
> >> > >> >>>>>>>>> proposal. I just wanted to present it for consideration
> >> > >> alongside
> >> > >> >>>>>> Jorge's
> >> > >> >>>>>>>>> original one.
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> Thanks for raising these very good points,
> >> > >> >>>>>>>>> -John
> >> > >> >>>>>>>>>
> >> > >> >>>>>>>>> On Tue, May 19, 2020, at 20:49, Sophie Blee-Goldman wrote:
> >> > >> >>>>>>>>>>> Rather than working around it, I think we should just
> >> fix it
> >> > >> >>>>>>>>>> Now *that's* a "fancy" idea :P
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> That was my primary concern, although I do have a vague
> >> sense
> >> > >> of
> >> > >> >>>>>> worry
> >> > >> >>>>>>>>>> that we might be allowing users to get into trouble
> >> without
> >> > >> >>>>>> realizing it.
> >> > >> >>>>>>>>>> For example if their custom serdes suffer a similar bug
> >> as the
> >> > >> >>>>> above,
> >> > >> >>>>>>>>>> and/or
> >> > >> >>>>>>>>>> they rely on getting results in increasing order (of the
> >> keys)
> >> > >> >>>> even
> >> > >> >>>>>> when
> >> > >> >>>>>>>>>> to < from. Maybe they're relying on the fact that the
> >> range
> >> > >> >>>> query
> >> > >> >>>>>> returns
> >> > >> >>>>>>>>>> nothing in that case.
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> Not sure if that qualifies as relying on a bug or not,
> >> but in
> >> > >> >>>> that
> >> > >> >>>>>> past
> >> > >> >>>>>>>>>> we've
> >> > >> >>>>>>>>>> taken the stance that we should not break compatibility
> >> even if
> >> > >> >>>> the
> >> > >> >>>>>> user
> >> > >> >>>>>>>>>> was relying on bugs or unintentional behavior.
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> Just to clarify I'm not advocating strongly against this
> >> > >> >>>> proposal,
> >> > >> >>>>>> just
> >> > >> >>>>>>>>>> laying
> >> > >> >>>>>>>>>> out some considerations we should take into account. At
> >> the end
> >> > >> >>>> of
> >> > >> >>>>>> the
> >> > >> >>>>>>>>> day
> >> > >> >>>>>>>>>> we should do what's right rather than maintain
> >> compatibility
> >> > >> >>>> with
> >> > >> >>>>>>>>> existing
> >> > >> >>>>>>>>>> bugs, but sometimes there's a reasonable middle ground.
> >> > >> >>>>>>>>>>
> >> > >> >>>>>>>>>> On Tue, May 19, 2020 at 6:15 PM John Roesler <
> >> > >> >>>> vvcephei@apache.org>
> >> > >> >>>>>>>>> wrote:
> >> > >> >>>>>>>>>>> Thanks Sophie,
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> Woah, that’s a nasty bug. Rather than working around it,
> >> I
> >> > >> >>>> think
> >> > >> >>>>> we
> >> > >> >>>>>>>>> should
> >> > >> >>>>>>>>>>> just fix it. I’ll leave some comments on the Jira.
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> It doesn’t seem like it should be this KIP’s concern
> >> that some
> >> > >> >>>>>> serdes
> >> > >> >>>>>>>>>>> might be incorrectly written.
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> Were there other practical concerns that you had in mind?
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> Thanks,
> >> > >> >>>>>>>>>>> John
> >> > >> >>>>>>>>>>>
> >> > >> >>>>>>>>>>> On Tue, May 19, 2020, at 19:10, Sophie Blee-Goldman
> >> wrote:
> >> > >> >>>>>>>>>>>> I like this "fancy idea" to just flip the to/from bytes
> >> but I
> >> > >> >>>>> think
> >> > >> >>>>>>>>> there
> >> > >> >>>>>>>>>>>> are some practical limitations to implementing this. In
> >> > >> >>>>> particular
> >> > >> >>>>>>>>>>>> I'm thinking about this issue
> >> > >> >>>>>>>>>>>> <https://issues.apache.org/jira/browse/KAFKA-8159>
> >> with the
> >> > >> >>>>>> built-in
> >> > >> >>>>>>>>>>> signed
> >> > >> >>>>>>>>>>>> number serdes.
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> This trick would actually fix the problem for
> >> > >> >>>> negative-negative
> >> > >> >>>>>>>>> queries
> >> > >> >>>>>>>>>>>> (ie where to & from are negative) but would cause
> >> > >> undetectable
> >> > >> >>>>>>>>>>>> incorrect results for negative-positive queries. For
> >> example,
> >> > >> >>>> say
> >> > >> >>>>>> you
> >> > >> >>>>>>>>>>>> call #range with from = -1 and to = 1, using the Short
> >> > >> serdes.
> >> > >> >>>>> The
> >> > >> >>>>>>>>>>>> serialized bytes for that are
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> from = 1111111111111111
> >> > >> >>>>>>>>>>>> to = 0000000000000001
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> so we would end up flipping those and iterating over
> >> all keys
> >> > >> >>>>> from
> >> > >> >>>>>>>>>>>> 0000000000000001 to 1111111111111111. Iterating in
> >> > >> >>>>> lexicographical
> >> > >> >>>>>>>>>>>> order means we would iterate over every key in the space
> >> > >> >>>> *except*
> >> > >> >>>>>> for
> >> > >> >>>>>>>>>>>> 0, but 0 is actually the *only* other key we meant to be
> >> > >> >>>> included
> >> > >> >>>>>> in
> >> > >> >>>>>>>>> the
> >> > >> >>>>>>>>>>>> range query.
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> Currently we just log a warning and return an empty
> >> iterator
> >> > >> >>>> when
> >> > >> >>>>>>>>>>>> to < from, which is obviously also incorrect but feels
> >> > >> >>>> slightly
> >> > >> >>>>>> more
> >> > >> >>>>>>>>>>>> palatable. If we start automatically converting to
> >> reverse
> >> > >> >>>>> queries
> >> > >> >>>>>> we
> >> > >> >>>>>>>>>>>> can't even log a warning in this case unless we wanted
> >> to log
> >> > >> >>>> a
> >> > >> >>>>>>>>> warning
> >> > >> >>>>>>>>>>>> every time, which would be weird to do for a valid
> >> usage of a
> >> > >> >>>> new
> >> > >> >>>>>>>>>>>> feature.
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> All that said, I still like the idea overall. Off the
> >> top of
> >> > >> >>>> my
> >> > >> >>>>>> head
> >> > >> >>>>>>>>> I
> >> > >> >>>>>>>>>>> guess
> >> > >> >>>>>>>>>>>> we could add a store config to enable/disable automatic
> >> > >> >>>> reverse
> >> > >> >>>>>>>>>>> iteration,
> >> > >> >>>>>>>>>>>> which is off by default?
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> Thanks for the KIP! This will be a nice addition
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> Sophie
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>
> >> > >> >>>>>>>>>>>> On Tue, May 19, 2020 at 3:21 PM John Roesler <
> >> > >> >>>>> vvcephei@apache.org>
> >> > >> >>>>>>>>>>> wrote:
> >> > >> >>>>>>>>>>>>> Hi there Jorge,
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> Thanks for the KIP!
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> I think this feature sounds very reasonable.
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> I'm not 100% sure if this is "too fancy", but what do
> >> you
> >> > >> >>>> think
> >> > >> >>>>>>>>>>>>> about avoiding the enum by instead allowing people to
> >> flip
> >> > >> >>>>>>>>>>>>> the "from" and "to" endpoints? I.e., reading from "A"
> >> to "Z"
> >> > >> >>>>> would
> >> > >> >>>>>>>>>>>>> be a forward scan, and from "Z" to "A" would be a
> >> backward
> >> > >> >>>> one?
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> Thanks,
> >> > >> >>>>>>>>>>>>> -John
> >> > >> >>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>> On Tue, May 19, 2020, at 16:20, Jorge Quilcate wrote:
> >> > >> >>>>>>>>>>>>>> Hi everyone,
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>> I would like to start the discussion for KIP-617:
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>
> >> > >> >>>>>>
> >> > >> >>>>>
> >> > >> >>>>
> >> > >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-617%3A+Allow+Kafka+Streams+State+Stores+to+be+iterated+backwards
> >> > >> >>>>>>>>>>>>>> Looking forward to your feedback.
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>> Thanks!
> >> > >> >>>>>>>>>>>>>> Jorge.
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>>
> >> > >> >>>>>>>>>>>>>> Attachments:
> >> > >> >>>>>>>>>>>>>> * 0x5F2C6E22064982DF.asc
> >> > >> >>>>>>>
> >> > >> >>>>>>>
> >> > >> >>>>>>> Attachments:
> >> > >> >>>>>>> * 0x5F2C6E22064982DF.asc
> >> > >> >>>>>>
> >> > >> >>>>>
> >> > >> >>>>
> >> > >> >>>
> >> > >> >>
> >> > >>
> >> > >>
> >> >
> >>
> >
>