You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Bruno Cadonna <ca...@apache.org> on 2023/09/06 12:25:08 UTC

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Hi Alieh,

Thanks for the KIP!

One high level comment/question:

I assume you separated single key queries into two classes because 
versioned key queries return a single value and multi version key 
queries return iterators. Although, range queries always return 
iterators, it would make sense to also separate range queries for 
versioned state stores into range queries that return one single version 
of the keys within a range and range queries that return multiple 
version of the keys within a range, IMO. That would reduce the 
meaningless combinations.
WDYT?

Best,
Bruno

On 8/16/23 8:01 PM, Alieh Saeedi wrote:
> Hi all,
> 
> I splitted KIP-960
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores>
> into
> three separate KIPs. Therefore, please continue discussions about range
> interactive queries here. You can see all the addressed reviews on the
> following page. Thanks in advance.
> 
> KIP-969: Support range interactive queries (IQv2) for versioned state stores
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores>
> 
> I look forward to your feedback!
> 
> Cheers,
> Alieh
> 

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Alieh,

I just realized that your KIP uses allKeys() and KIP-997 uses 
withAllKeys() and the current RangeQuery uses withNoBounds(). We should 
agree on one of those.

I am in favor of withAllKeys() but I am also fine with 
withNoKeyBounds(). I just prefer the former because it is more concise.

Best,
Bruno

On 12/12/23 9:08 AM, Bruno Cadonna wrote:
> Hi Alieh,
> 
> I think using TimestampedRangeQuery to query the latest versions is 
> totally fine. If it is not, users will report it and we can add it then.
> 
> Best,
> Bruno
> 
> On 12/11/23 6:22 PM, Alieh Saeedi wrote:
>> Thank you all.
>> I decided to remove the ordering from the KIP and maybe move it to the
>> subsequent KIPs (based on user demand).
>> I skimmed over the discussion thread, but we still had an open question
>> about how a user can retrieve the `latest()` values. I think what 
>> Matthias
>> suggested (using `TimestampedRangeQuery`) can be the solution. What do 
>> you
>> think? Bests,
>> Alieh
>>
>> On Wed, Dec 6, 2023 at 1:57 PM Lucas Brutschy
>> <lb...@confluent.io.invalid> wrote:
>>
>>> Hi Alieh,
>>>
>>> I think we do not have to restrict ourselves too much for the future
>>> and complicate the implementation. The user can always store away and
>>> sort, so we should only provide the ordering guarantee we can provide
>>> efficiently, and we shouldn't restrict our future evolution too much
>>> by this. I think a global ordering by timestamp is sufficient for this
>>> KIP, so I vote for option 2.
>>>
>>> Cheers,
>>> Lucas
>>>
>>> On Fri, Dec 1, 2023 at 8:45 PM Alieh Saeedi
>>> <as...@confluent.io.invalid> wrote:
>>>>
>>>> Hi all,
>>>> I updated the KIP based on the changes made in the former KIP 
>>>> (KIP-968).
>>> So
>>>> with the `ResultOrder` enum, the class `MultiVersionedRangeQuery` had
>>> some
>>>> changes both in the defined fields and defined methods.
>>>>
>>>> Based on the PoC PR, what we currently promise in the KIP about 
>>>> ordering
>>>> seems like a dream. I intended to enable the user to have a global
>>> ordering
>>>> based on the key or timestamp (using `orderByKey()` or
>>>> `orderByTimestamp()`) and then even have a partial ordering based on 
>>>> the
>>>> other parameter.  For example, if the user specifies
>>>> `orderByKey().withDescendingKey().withAscendingTimestamps()`, then the
>>>> global ordering is based on keys in a descending order, and then all 
>>>> the
>>>> records with the same key are ordered ascendingly based on ts. The 
>>>> result
>>>> will be something like (k3,v1,t1), (k3,v2,t2), (k2,v2,t3), (k1.v1.t1)
>>>> (assuming that k1<k2<k3 and t1<t2<t3).
>>>>
>>>> But in reality, we have limitations for having a global ordering 
>>>> based on
>>>> keys since we are iterating over the segments in a lazy manner.
>>> Therefore,
>>>> when we are processing the current segment, we have no knowledge of the
>>>> keys in the next segment.
>>>>
>>>> Now I have two suggestions:
>>>> 1. Changing the `MultiVersionedRangeQuery` class as follows:
>>>>
>>>> private final ResultOrder *segmentOrder*;
>>>> private final contentOrder *segmentContentOrder*; // can be KEY_WISE or
>>>> TIMESTAMP_WISE
>>>> private final ResultOrder  *keyOrder*;
>>>> private final ResultOrder *timestampOrder*;
>>>>
>>>> This way, the global ordering is specified by the `segmentOrder`. It
>>> means
>>>> we either show the results from the oldest to the latest segment
>>>> (ASCENDING) or from the latest to the oldest segment (DESCENDING).
>>>> Then, inside each segment, we guarantee a `segmentContentOrder` 
>>>> which can
>>>> be `KEY_WISE` or `TIMESTAMP_WISE`. The key order and timestamp order 
>>>> are
>>>> specified by the `keyOrder` and `timestampOrder` properties,
>>> respectively.
>>>> If the content of a segment must be ordered key-wise and then we 
>>>> have two
>>>> records with the same key (it happens in older segments), then the
>>>> `timestampOrder` determines the order between them.
>>>>
>>>> 2. We define that global ordering can only be based on timestamps (the
>>>> `timestampOrder` property), and if two records have the same timestamp,
>>> the
>>>> `keyOrder` determines the order between them.
>>>>
>>>> I think the first suggestion gives more flexibility to the user, but it
>>> is
>>>> more complicated. I mean, it needs good Javadocs.
>>>>
>>>> I look forward to your ideas.
>>>>
>>>> Cheers,
>>>> Alieh
>>>>
>>>>
>>>> On Mon, Nov 6, 2023 at 3:08 PM Alieh Saeedi <as...@confluent.io>
>>> wrote:
>>>>
>>>>> Thank you, Bruno and Matthias.
>>>>> I updated the KIP as follows:
>>>>> 1. The one remaining `asOf` word in the KIP is removed.
>>>>> 2. Example 2 is updated. Thanks, Bruno for the correction.
>>>>>
>>>>> Discussions and open questions
>>>>> 1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as 
>>>>> well.
>>>>> Because the results must have a global ordering. Either based on 
>>>>> key or
>>>>> based on ts. For example, we can have
>>>>> `orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
>>>>> global ordering is based on keys in a descending order, and then all
>>> the
>>>>> records with the same key are ordered ascendingly based on ts. The
>>> result
>>>>> will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
>>>>> (k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
>>>>> 2. About having the `latest()` method: it seems like we are undecided
>>> yet.
>>>>> Adding a new class or ignoring `latest()` for VersionedRangeQuery and
>>>>> instead using the `TimestampedRangeQuery` as Matthias suggested.
>>>>>
>>>>> Cheers,
>>>>> Alieh
>>>>>
>>>>> On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org>
>>> wrote:
>>>>>
>>>>>> Great discussion. Seems we are making good progress.
>>>>>>
>>>>>> I see advantages and disadvantages in splitting out a "single-ts
>>>>>> key-range" query type. I guess, the main question might be, what
>>>>>> use-cases do we anticipate and how common we believe they are?
>>>>>>
>>>>>> We should also take KIP-992 (adding `TimestampedRangeQuery`) into
>>> account.
>>>>>>
>>>>>> (1) The most common use case seems to be, a key-range over latest. 
>>>>>> For
>>>>>> this one, we could use `TimestampedRangeQuery` -- it would return a
>>>>>> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the 
>>>>>> won't
>>>>>> be any information loss, because "toTimestamp" would be `null` 
>>>>>> anyway.
>>>>>>
>>>>>>
>>>>>> (2) The open question is, how common is a key-range in a point in the
>>>>>> past? For this case, using
>>>>>> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
>>>>>> actually not to be a bad UX, and also does not really need to be
>>>>>> explained how to do this (compared to "latest" that required to pass
>>> in
>>>>>> MAX_INT).
>>>>>>
>>>>>>
>>>>>> If we add a new query type, we avoid both issues (are they actually
>>>>>> issues?) and add some nice syntactic sugar to the API. The question
>>> is,
>>>>>> if it's worth the effort and expanded API surface area?
>>>>>>
>>>>>> To summarize:
>>>>>>
>>>>>> Add new query type:
>>>>>>
>>>>>>> // queries latest; returns VersionedRecords
>>>>>>> VersionedRangeQuery.withKeyRange(...);
>>>>>>>
>>>>>>> VersionedRangeQuery.withKeyRange(...).asOf(ts);
>>>>>>
>>>>>> vs
>>>>>>
>>>>>> No new query type:
>>>>>>
>>>>>>> // queries latest; returns ValueAndTimestamps
>>>>>>> TimestampedRangeQuery.withRange(...);
>>>>>>>
>>>>>>> MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
>>>>>>
>>>>>>
>>>>>>
>>>>>> I guess, bottom line, I would be ok with either one and I am actually
>>>>>> not even sure which one I prefer personally. Just wanted to lay out
>>> the
>>>>>> tradeoffs I see. Not sure if three are other considerations that 
>>>>>> would
>>>>>> tip the scale into either direction?
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
>>>>>>> Hi Alieh,
>>>>>>>
>>>>>>> I like the examples!
>>>>>>>
>>>>>>>
>>>>>>> 1.
>>>>>>> Some terms like `asOf` in the descriptions still need to be
>>> replaced in
>>>>>>> the KIP.
>>>>>>>
>>>>>>>
>>>>>>> 2.
>>>>>>> In your last e-mail you state:
>>>>>>>
>>>>>>> "How can a user retrieve the latest value? We have the same issue
>>> with
>>>>>>> kip-968 as well."
>>>>>>>
>>>>>>> Why do we have the same issue in KIP-968?
>>>>>>> If users need to retrieve the latest value for a specific key, they
>>>>>>> should use KIP-960.
>>>>>>>
>>>>>>>
>>>>>>> 3.
>>>>>>> Regarding querying the latest version (or an asOf version) of
>>> records
>>>>>> in
>>>>>>> a given key range, that is exactly why I proposed to split the query
>>>>>>> class. One class would return the latest and the asOf versions
>>> (i.e. a
>>>>>>> single version) of records in a key range and the other class would
>>>>>>> return all versions in a given time range (i.e. multiple versions)
>>> of
>>>>>>> the records in a given key range. The splitting in two classes
>>> avoids
>>>>>> to
>>>>>>> specify a time range and latest or a time range and asOf on a given
>>> key
>>>>>>> range.
>>>>>>>
>>>>>>> Alternatively, you could keep one class and you could specify that
>>> the
>>>>>>> last call wins as you specified for fromTime() and toTime(). For
>>>>>>> example, for
>>>>>>>
>>>>>>>
>>>>>>
>>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
>>>>>>>
>>>>>>> latest() wins. However, how would you interpret
>>>>>>>
>>>>>>>
>>>>>>
>>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
>>>>>>>
>>>>>>> Is it [t1, t2] or [-INF, t2]?
>>>>>>> (I would say the latter, but somebody else would say differently)
>>>>>>>
>>>>>>> The two class solution seems cleaner to me since we do not need to
>>>>>>> consider those edge cases.
>>>>>>> You could propose both classes in this KIP.
>>>>>>>
>>>>>>>
>>>>>>> 4.
>>>>>>> Why do we need orderByKey() and orderByTimestamp()?
>>>>>>> Aren't withAscendingKeys(), withDescendingKeys(),
>>>>>>> withAscendingTimestamps(), and withDescendingTimestamps()
>>> sufficient?
>>>>>>>
>>>>>>>
>>>>>>> 5.
>>>>>>> In example 2, why is
>>>>>>>
>>>>>>> key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
>>>>>>> 2023-01-25T10:00:00.00Z
>>>>>>>
>>>>>>> not part of the result?
>>>>>>> It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
>>>>>>> which overlaps with the time range [2023-01-17T10:00:00.00Z,
>>>>>>> 2023-01-30T10:00:00.00Z] of the query.
>>>>>>>
>>>>>>> (BTW, in the second example, you forgot to add "key" to the output.)
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Bruno
>>>>>>>
>>>>>>>
>>>>>>> On 10/25/23 4:01 PM, Alieh Saeedi wrote:
>>>>>>>> Thanks, Matthias and Bruno.
>>>>>>>> Here is a list of updates:
>>>>>>>>
>>>>>>>>      1. I changed the variable and method names as I did for
>>> KIP-968, as
>>>>>>>>      follows:
>>>>>>>>         - "fromTimestamp" -> fromTime
>>>>>>>>         - "asOfTimestamp"-> toTime
>>>>>>>>         - "from(instant)" -> fromTime(instant)"
>>>>>>>>         - asOf(instant)"->toTime(instant)
>>>>>>>>      2. As Bruno suggested for KIP-968, I added `orderByKey()`,
>>>>>>>>      `withAscendingKeys()`, and `withAscendingTimestamps()` methods
>>> for
>>>>>>>> user
>>>>>>>>      readability.
>>>>>>>>      3. I updated the "Example" section as well.
>>>>>>>>
>>>>>>>> Some points:
>>>>>>>>
>>>>>>>>      1. Even though the kip suggests adding the `get(k
>>> lowerkeybound, k
>>>>>>>>      upperkeybound, long fromtime, long totime)` method to the
>>>>>>>> interface, I
>>>>>>>>      added this method to the `rocksdbversionedstore` class for 
>>>>>>>> now.
>>>>>>>>      2. Matthias, you mentioned a very important point. How can a
>>> user
>>>>>>>>      retrieve the latest value? We have the same issue with kip-968
>>> as
>>>>>>>> well.
>>>>>>>>      Asking a user to call `toTime(max)` violates the API design
>>> rules,
>>>>>>>> as you
>>>>>>>>      mentioned. So I think we must have a `latest()` method for 
>>>>>>>> both
>>>>>>>> KIP-968 and
>>>>>>>>      KIP-969. What do you think about that?
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Alieh
>>>>>>>>
>>>>>>>> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the update.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> To retrieve
>>>>>>>>>>>       the latest value(s), the user must call just the asOf
>>> method
>>>>>> with
>>>>>>>>> the MAX
>>>>>>>>>>>       value (asOf(MAX)). The same applies to KIP-968. Do you
>>> think
>>>>>>>>>>> it is
>>>>>>>>> clumsy,
>>>>>>>>>>>       Matthias?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Well, in KIP-968 calling `asOf` and passing in a timestamp is
>>>>>> optional,
>>>>>>>>> and default is "latest", right? So while `asOf(MAX)` does the same
>>>>>>>>> thing, practically users would never call `asOf` for a "latest"
>>> query?
>>>>>>>>>
>>>>>>>>> In this KIP, we enforce that users give us a key range (we have
>>> the 4
>>>>>>>>> static entry point methods to define a query for this), and we
>>> say we
>>>>>>>>> default to "no bounds" for time range by default.
>>>>>>>>>
>>>>>>>>> The existing `RangeQuery` allows to query a range of keys for
>>> existing
>>>>>>>>> stores. It seems to be a common pattern to query a key-range on
>>>>>> latest.
>>>>>>>>> -- in the current proposal, users would need to do:
>>>>>>>>>
>>>>>>>>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>>>>>>>>>
>>>>>>>>> Would like to hear from others if we think that's good user
>>>>>> experience?
>>>>>>>>> If we agree to accept this, I think we should explain how to do
>>> this
>>>>>> in
>>>>>>>>> the JavaDocs (and also regular docs... --- otherwise, I can
>>> already
>>>>>>>>> anticipate user question on all question-asking-channels how to
>>> do a
>>>>>>>>> "normal key range query". IMHO, the problem is not that the code
>>>>>> itself
>>>>>>>>> it too clumsy, but that it's totally not obvious to uses how to
>>>>>> express
>>>>>>>>> it without actually explaining it to them. It basically violated
>>> the
>>>>>> API
>>>>>>>>> design rule "make it easy to use / simple things should be easy".
>>>>>>>>>
>>>>>>>>> Btw: We could also re-use `RangeQuery` and add am implementation
>>> to
>>>>>>>>> `VersionedStateStore` to just accept this query type, with "key
>>> range
>>>>>>>>> over latest" semantics. -- The issue is of course, that uses need
>>> to
>>>>>>>>> know that the query would return `ValueAndTimestamp` and not
>>> plain `V`
>>>>>>>>> (or we add a translation step to unwrap the value, but we would
>>> lose
>>>>>> the
>>>>>>>>> "validFrom" timestamp -- validTo would be `null`). Because type
>>> safety
>>>>>>>>> is a general issue in IQv2 it would not make it worse (in the
>>> strict
>>>>>>>>> sense), but I am also not sure if we want to dig an even deeper
>>>>>> hole...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
>>>>>>>>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is
>>> a
>>>>>>>>> summary
>>>>>>>>>> of the updates I made to the KIP:
>>>>>>>>>>
>>>>>>>>>>       1.  I liked the idea of renaming methods as Matthias
>>> suggested.
>>>>>>>>>>       2. I removed the allversions() method as I did in KIP-968.
>>> To
>>>>>>>>> retrieve
>>>>>>>>>>       the latest value(s), the user must call just the asOf 
>>>>>>>>>> method
>>>>>> with
>>>>>>>>> the MAX
>>>>>>>>>>       value (asOf(MAX)). The same applies to KIP-968. Do you
>>> think it
>>>>>> is
>>>>>>>>> clumsy,
>>>>>>>>>>       Matthias?
>>>>>>>>>>       3. I added a method to the *VersionedKeyValueStore
>>> *interface,
>>>>>>>>>> as I
>>>>>>>>> did
>>>>>>>>>>       for KIP-968.
>>>>>>>>>>       4. Matthias: I do not get what you mean by your second
>>> comment.
>>>>>>>>>> Isn't
>>>>>>>>>>       the KIP already explicit about that?
>>>>>>>>>>
>>>>>>>>>>       > I assume, results are returned by timestamp for each key.
>>> The
>>>>>>>>>> KIP
>>>>>>>>>>       should be explicit about it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Alieh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mjsax@apache.org
>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for updating the KIP.
>>>>>>>>>>>
>>>>>>>>>>> Not sure if I agree or not with Bruno's idea to split the query
>>>>>> types
>>>>>>>>>>> further? In the end, we split them only because there is three
>>>>>>>>>>> different
>>>>>>>>>>> return types: single value, value-iterator, key-value-iterator.
>>>>>>>>>>>
>>>>>>>>>>> What do we gain by splitting out single-ts-range-key? In the
>>> end,
>>>>>> for
>>>>>>>>>>> range-ts-range-key the proposed class is necessary and is a
>>> superset
>>>>>>>>>>> (one can set both timestamps to the same value, for single-ts
>>>>>> lookup).
>>>>>>>>>>>
>>>>>>>>>>> The mentioned simplification might apply to
>>> "single-ts-range-key"
>>>>>>>>>>> but I
>>>>>>>>>>> don't see a simplification for the proposed (and necessary)
>>> query
>>>>>>>>>>> type?
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, I see an advantage of a single-ts-range-key
>>> for
>>>>>>>>>>> querying over the "latest version" with a range of keys. For a
>>>>>>>>>>> single-ts-range-key query, this it would be the default
>>> (similar to
>>>>>>>>>>> VersionedKeyQuery with not asOf-timestamped defined).
>>>>>>>>>>>
>>>>>>>>>>> In the current version of the KIP, (if we agree that default
>>> should
>>>>>>>>>>> actually return "all versions" not "latest" -- this default was
>>>>>>>>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
>>>>>>>>>>> need to
>>>>>>>>>>> have the same default here to stay consistent), users would
>>> need to
>>>>>>>>>>> pass
>>>>>>>>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query
>>> the
>>>>>>>>>>> latest point in time only, what seems to be clumsy? Or we could
>>> add
>>>>>> a
>>>>>>>>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it
>>> does
>>>>>>>>>>> seems
>>>>>>>>>>> a little clumsy, too.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The overall order of the returned records is by Key
>>>>>>>>>>>
>>>>>>>>>>> I assume, results are returned by timestamp for each key. The
>>> KIP
>>>>>>>>>>> should
>>>>>>>>>>> be explicit about it.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> To be very explicit, should we rename the methods to specify
>>> the key
>>>>>>>>> bound?
>>>>>>>>>>>
>>>>>>>>>>>      - withRange -> withKeyRange
>>>>>>>>>>>      - withLowerBound -> withLowerKeyBound
>>>>>>>>>>>      - withUpperBound -> withUpperKeyBound
>>>>>>>>>>>      - withNoBounds -> allKeys (or withNoKeyBounds, but we use
>>>>>>>>>>> `allVersions` and not `noTimeBound` and should align the
>>> naming?)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
>>>>>>>>>>>> Hi Alieh,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>>>
>>>>>>>>>>>> One high level comment/question:
>>>>>>>>>>>>
>>>>>>>>>>>> I assume you separated single key queries into two classes
>>> because
>>>>>>>>>>>> versioned key queries return a single value and multi version
>>> key
>>>>>>>>>>>> queries return iterators. Although, range queries always return
>>>>>>>>>>>> iterators, it would make sense to also separate range queries
>>> for
>>>>>>>>>>>> versioned state stores into range queries that return one
>>> single
>>>>>>>>> version
>>>>>>>>>>>> of the keys within a range and range queries that return
>>> multiple
>>>>>>>>>>>> version of the keys within a range, IMO. That would reduce the
>>>>>>>>>>>> meaningless combinations.
>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Bruno
>>>>>>>>>>>>
>>>>>>>>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I splitted KIP-960
>>>>>>>>>>>>> <
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>>>>>>>>
>>>>>>>>>>>>> into
>>>>>>>>>>>>> three separate KIPs. Therefore, please continue discussions
>>> about
>>>>>>>>> range
>>>>>>>>>>>>> interactive queries here. You can see all the addressed
>>> reviews
>>>>>>>>>>>>> on the
>>>>>>>>>>>>> following page. Thanks in advance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> KIP-969: Support range interactive queries (IQv2) for
>>> versioned
>>>>>>>>>>>>> state
>>>>>>>>>>>>> stores
>>>>>>>>>>>>> <
>>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I look forward to your feedback!
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Alieh
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>>

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Alieh,

I think using TimestampedRangeQuery to query the latest versions is 
totally fine. If it is not, users will report it and we can add it then.

Best,
Bruno

On 12/11/23 6:22 PM, Alieh Saeedi wrote:
> Thank you all.
> I decided to remove the ordering from the KIP and maybe move it to the
> subsequent KIPs (based on user demand).
> I skimmed over the discussion thread, but we still had an open question
> about how a user can retrieve the `latest()` values. I think what Matthias
> suggested (using `TimestampedRangeQuery`) can be the solution. What do you
> think? Bests,
> Alieh
> 
> On Wed, Dec 6, 2023 at 1:57 PM Lucas Brutschy
> <lb...@confluent.io.invalid> wrote:
> 
>> Hi Alieh,
>>
>> I think we do not have to restrict ourselves too much for the future
>> and complicate the implementation. The user can always store away and
>> sort, so we should only provide the ordering guarantee we can provide
>> efficiently, and we shouldn't restrict our future evolution too much
>> by this. I think a global ordering by timestamp is sufficient for this
>> KIP, so I vote for option 2.
>>
>> Cheers,
>> Lucas
>>
>> On Fri, Dec 1, 2023 at 8:45 PM Alieh Saeedi
>> <as...@confluent.io.invalid> wrote:
>>>
>>> Hi all,
>>> I updated the KIP based on the changes made in the former KIP (KIP-968).
>> So
>>> with the `ResultOrder` enum, the class `MultiVersionedRangeQuery` had
>> some
>>> changes both in the defined fields and defined methods.
>>>
>>> Based on the PoC PR, what we currently promise in the KIP about ordering
>>> seems like a dream. I intended to enable the user to have a global
>> ordering
>>> based on the key or timestamp (using `orderByKey()` or
>>> `orderByTimestamp()`) and then even have a partial ordering based on the
>>> other parameter.  For example, if the user specifies
>>> `orderByKey().withDescendingKey().withAscendingTimestamps()`, then the
>>> global ordering is based on keys in a descending order, and then all the
>>> records with the same key are ordered ascendingly based on ts. The result
>>> will be something like (k3,v1,t1), (k3,v2,t2), (k2,v2,t3), (k1.v1.t1)
>>> (assuming that k1<k2<k3 and t1<t2<t3).
>>>
>>> But in reality, we have limitations for having a global ordering based on
>>> keys since we are iterating over the segments in a lazy manner.
>> Therefore,
>>> when we are processing the current segment, we have no knowledge of the
>>> keys in the next segment.
>>>
>>> Now I have two suggestions:
>>> 1. Changing the `MultiVersionedRangeQuery` class as follows:
>>>
>>> private final ResultOrder *segmentOrder*;
>>> private final contentOrder *segmentContentOrder*; // can be KEY_WISE or
>>> TIMESTAMP_WISE
>>> private final ResultOrder  *keyOrder*;
>>> private final ResultOrder *timestampOrder*;
>>>
>>> This way, the global ordering is specified by the `segmentOrder`. It
>> means
>>> we either show the results from the oldest to the latest segment
>>> (ASCENDING) or from the latest to the oldest segment (DESCENDING).
>>> Then, inside each segment, we guarantee a `segmentContentOrder` which can
>>> be `KEY_WISE` or `TIMESTAMP_WISE`. The key order and timestamp order are
>>> specified by the `keyOrder` and `timestampOrder` properties,
>> respectively.
>>> If the content of a segment must be ordered key-wise and then we have two
>>> records with the same key (it happens in older segments), then the
>>> `timestampOrder` determines the order between them.
>>>
>>> 2. We define that global ordering can only be based on timestamps (the
>>> `timestampOrder` property), and if two records have the same timestamp,
>> the
>>> `keyOrder` determines the order between them.
>>>
>>> I think the first suggestion gives more flexibility to the user, but it
>> is
>>> more complicated. I mean, it needs good Javadocs.
>>>
>>> I look forward to your ideas.
>>>
>>> Cheers,
>>> Alieh
>>>
>>>
>>> On Mon, Nov 6, 2023 at 3:08 PM Alieh Saeedi <as...@confluent.io>
>> wrote:
>>>
>>>> Thank you, Bruno and Matthias.
>>>> I updated the KIP as follows:
>>>> 1. The one remaining `asOf` word in the KIP is removed.
>>>> 2. Example 2 is updated. Thanks, Bruno for the correction.
>>>>
>>>> Discussions and open questions
>>>> 1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as well.
>>>> Because the results must have a global ordering. Either based on key or
>>>> based on ts. For example, we can have
>>>> `orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
>>>> global ordering is based on keys in a descending order, and then all
>> the
>>>> records with the same key are ordered ascendingly based on ts. The
>> result
>>>> will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
>>>> (k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
>>>> 2. About having the `latest()` method: it seems like we are undecided
>> yet.
>>>> Adding a new class or ignoring `latest()` for VersionedRangeQuery and
>>>> instead using the `TimestampedRangeQuery` as Matthias suggested.
>>>>
>>>> Cheers,
>>>> Alieh
>>>>
>>>> On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org>
>> wrote:
>>>>
>>>>> Great discussion. Seems we are making good progress.
>>>>>
>>>>> I see advantages and disadvantages in splitting out a "single-ts
>>>>> key-range" query type. I guess, the main question might be, what
>>>>> use-cases do we anticipate and how common we believe they are?
>>>>>
>>>>> We should also take KIP-992 (adding `TimestampedRangeQuery`) into
>> account.
>>>>>
>>>>> (1) The most common use case seems to be, a key-range over latest. For
>>>>> this one, we could use `TimestampedRangeQuery` -- it would return a
>>>>> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't
>>>>> be any information loss, because "toTimestamp" would be `null` anyway.
>>>>>
>>>>>
>>>>> (2) The open question is, how common is a key-range in a point in the
>>>>> past? For this case, using
>>>>> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
>>>>> actually not to be a bad UX, and also does not really need to be
>>>>> explained how to do this (compared to "latest" that required to pass
>> in
>>>>> MAX_INT).
>>>>>
>>>>>
>>>>> If we add a new query type, we avoid both issues (are they actually
>>>>> issues?) and add some nice syntactic sugar to the API. The question
>> is,
>>>>> if it's worth the effort and expanded API surface area?
>>>>>
>>>>> To summarize:
>>>>>
>>>>> Add new query type:
>>>>>
>>>>>> // queries latest; returns VersionedRecords
>>>>>> VersionedRangeQuery.withKeyRange(...);
>>>>>>
>>>>>> VersionedRangeQuery.withKeyRange(...).asOf(ts);
>>>>>
>>>>> vs
>>>>>
>>>>> No new query type:
>>>>>
>>>>>> // queries latest; returns ValueAndTimestamps
>>>>>> TimestampedRangeQuery.withRange(...);
>>>>>>
>>>>>> MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
>>>>>
>>>>>
>>>>>
>>>>> I guess, bottom line, I would be ok with either one and I am actually
>>>>> not even sure which one I prefer personally. Just wanted to lay out
>> the
>>>>> tradeoffs I see. Not sure if three are other considerations that would
>>>>> tip the scale into either direction?
>>>>>
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
>>>>>> Hi Alieh,
>>>>>>
>>>>>> I like the examples!
>>>>>>
>>>>>>
>>>>>> 1.
>>>>>> Some terms like `asOf` in the descriptions still need to be
>> replaced in
>>>>>> the KIP.
>>>>>>
>>>>>>
>>>>>> 2.
>>>>>> In your last e-mail you state:
>>>>>>
>>>>>> "How can a user retrieve the latest value? We have the same issue
>> with
>>>>>> kip-968 as well."
>>>>>>
>>>>>> Why do we have the same issue in KIP-968?
>>>>>> If users need to retrieve the latest value for a specific key, they
>>>>>> should use KIP-960.
>>>>>>
>>>>>>
>>>>>> 3.
>>>>>> Regarding querying the latest version (or an asOf version) of
>> records
>>>>> in
>>>>>> a given key range, that is exactly why I proposed to split the query
>>>>>> class. One class would return the latest and the asOf versions
>> (i.e. a
>>>>>> single version) of records in a key range and the other class would
>>>>>> return all versions in a given time range (i.e. multiple versions)
>> of
>>>>>> the records in a given key range. The splitting in two classes
>> avoids
>>>>> to
>>>>>> specify a time range and latest or a time range and asOf on a given
>> key
>>>>>> range.
>>>>>>
>>>>>> Alternatively, you could keep one class and you could specify that
>> the
>>>>>> last call wins as you specified for fromTime() and toTime(). For
>>>>>> example, for
>>>>>>
>>>>>>
>>>>>
>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
>>>>>>
>>>>>> latest() wins. However, how would you interpret
>>>>>>
>>>>>>
>>>>>
>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
>>>>>>
>>>>>> Is it [t1, t2] or [-INF, t2]?
>>>>>> (I would say the latter, but somebody else would say differently)
>>>>>>
>>>>>> The two class solution seems cleaner to me since we do not need to
>>>>>> consider those edge cases.
>>>>>> You could propose both classes in this KIP.
>>>>>>
>>>>>>
>>>>>> 4.
>>>>>> Why do we need orderByKey() and orderByTimestamp()?
>>>>>> Aren't withAscendingKeys(), withDescendingKeys(),
>>>>>> withAscendingTimestamps(), and withDescendingTimestamps()
>> sufficient?
>>>>>>
>>>>>>
>>>>>> 5.
>>>>>> In example 2, why is
>>>>>>
>>>>>> key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
>>>>>> 2023-01-25T10:00:00.00Z
>>>>>>
>>>>>> not part of the result?
>>>>>> It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
>>>>>> which overlaps with the time range [2023-01-17T10:00:00.00Z,
>>>>>> 2023-01-30T10:00:00.00Z] of the query.
>>>>>>
>>>>>> (BTW, in the second example, you forgot to add "key" to the output.)
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Bruno
>>>>>>
>>>>>>
>>>>>> On 10/25/23 4:01 PM, Alieh Saeedi wrote:
>>>>>>> Thanks, Matthias and Bruno.
>>>>>>> Here is a list of updates:
>>>>>>>
>>>>>>>      1. I changed the variable and method names as I did for
>> KIP-968, as
>>>>>>>      follows:
>>>>>>>         - "fromTimestamp" -> fromTime
>>>>>>>         - "asOfTimestamp"-> toTime
>>>>>>>         - "from(instant)" -> fromTime(instant)"
>>>>>>>         - asOf(instant)"->toTime(instant)
>>>>>>>      2. As Bruno suggested for KIP-968, I added `orderByKey()`,
>>>>>>>      `withAscendingKeys()`, and `withAscendingTimestamps()` methods
>> for
>>>>>>> user
>>>>>>>      readability.
>>>>>>>      3. I updated the "Example" section as well.
>>>>>>>
>>>>>>> Some points:
>>>>>>>
>>>>>>>      1. Even though the kip suggests adding the `get(k
>> lowerkeybound, k
>>>>>>>      upperkeybound, long fromtime, long totime)` method to the
>>>>>>> interface, I
>>>>>>>      added this method to the `rocksdbversionedstore` class for now.
>>>>>>>      2. Matthias, you mentioned a very important point. How can a
>> user
>>>>>>>      retrieve the latest value? We have the same issue with kip-968
>> as
>>>>>>> well.
>>>>>>>      Asking a user to call `toTime(max)` violates the API design
>> rules,
>>>>>>> as you
>>>>>>>      mentioned. So I think we must have a `latest()` method for both
>>>>>>> KIP-968 and
>>>>>>>      KIP-969. What do you think about that?
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Alieh
>>>>>>>
>>>>>>> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the update.
>>>>>>>>
>>>>>>>>
>>>>>>>>> To retrieve
>>>>>>>>>>       the latest value(s), the user must call just the asOf
>> method
>>>>> with
>>>>>>>> the MAX
>>>>>>>>>>       value (asOf(MAX)). The same applies to KIP-968. Do you
>> think
>>>>>>>>>> it is
>>>>>>>> clumsy,
>>>>>>>>>>       Matthias?
>>>>>>>>
>>>>>>>>
>>>>>>>> Well, in KIP-968 calling `asOf` and passing in a timestamp is
>>>>> optional,
>>>>>>>> and default is "latest", right? So while `asOf(MAX)` does the same
>>>>>>>> thing, practically users would never call `asOf` for a "latest"
>> query?
>>>>>>>>
>>>>>>>> In this KIP, we enforce that users give us a key range (we have
>> the 4
>>>>>>>> static entry point methods to define a query for this), and we
>> say we
>>>>>>>> default to "no bounds" for time range by default.
>>>>>>>>
>>>>>>>> The existing `RangeQuery` allows to query a range of keys for
>> existing
>>>>>>>> stores. It seems to be a common pattern to query a key-range on
>>>>> latest.
>>>>>>>> -- in the current proposal, users would need to do:
>>>>>>>>
>>>>>>>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>>>>>>>>
>>>>>>>> Would like to hear from others if we think that's good user
>>>>> experience?
>>>>>>>> If we agree to accept this, I think we should explain how to do
>> this
>>>>> in
>>>>>>>> the JavaDocs (and also regular docs... --- otherwise, I can
>> already
>>>>>>>> anticipate user question on all question-asking-channels how to
>> do a
>>>>>>>> "normal key range query". IMHO, the problem is not that the code
>>>>> itself
>>>>>>>> it too clumsy, but that it's totally not obvious to uses how to
>>>>> express
>>>>>>>> it without actually explaining it to them. It basically violated
>> the
>>>>> API
>>>>>>>> design rule "make it easy to use / simple things should be easy".
>>>>>>>>
>>>>>>>> Btw: We could also re-use `RangeQuery` and add am implementation
>> to
>>>>>>>> `VersionedStateStore` to just accept this query type, with "key
>> range
>>>>>>>> over latest" semantics. -- The issue is of course, that uses need
>> to
>>>>>>>> know that the query would return `ValueAndTimestamp` and not
>> plain `V`
>>>>>>>> (or we add a translation step to unwrap the value, but we would
>> lose
>>>>> the
>>>>>>>> "validFrom" timestamp -- validTo would be `null`). Because type
>> safety
>>>>>>>> is a general issue in IQv2 it would not make it worse (in the
>> strict
>>>>>>>> sense), but I am also not sure if we want to dig an even deeper
>>>>> hole...
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
>>>>>>>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is
>> a
>>>>>>>> summary
>>>>>>>>> of the updates I made to the KIP:
>>>>>>>>>
>>>>>>>>>       1.  I liked the idea of renaming methods as Matthias
>> suggested.
>>>>>>>>>       2. I removed the allversions() method as I did in KIP-968.
>> To
>>>>>>>> retrieve
>>>>>>>>>       the latest value(s), the user must call just the asOf method
>>>>> with
>>>>>>>> the MAX
>>>>>>>>>       value (asOf(MAX)). The same applies to KIP-968. Do you
>> think it
>>>>> is
>>>>>>>> clumsy,
>>>>>>>>>       Matthias?
>>>>>>>>>       3. I added a method to the *VersionedKeyValueStore
>> *interface,
>>>>>>>>> as I
>>>>>>>> did
>>>>>>>>>       for KIP-968.
>>>>>>>>>       4. Matthias: I do not get what you mean by your second
>> comment.
>>>>>>>>> Isn't
>>>>>>>>>       the KIP already explicit about that?
>>>>>>>>>
>>>>>>>>>       > I assume, results are returned by timestamp for each key.
>> The
>>>>>>>>> KIP
>>>>>>>>>       should be explicit about it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Alieh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mjsax@apache.org
>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for updating the KIP.
>>>>>>>>>>
>>>>>>>>>> Not sure if I agree or not with Bruno's idea to split the query
>>>>> types
>>>>>>>>>> further? In the end, we split them only because there is three
>>>>>>>>>> different
>>>>>>>>>> return types: single value, value-iterator, key-value-iterator.
>>>>>>>>>>
>>>>>>>>>> What do we gain by splitting out single-ts-range-key? In the
>> end,
>>>>> for
>>>>>>>>>> range-ts-range-key the proposed class is necessary and is a
>> superset
>>>>>>>>>> (one can set both timestamps to the same value, for single-ts
>>>>> lookup).
>>>>>>>>>>
>>>>>>>>>> The mentioned simplification might apply to
>> "single-ts-range-key"
>>>>>>>>>> but I
>>>>>>>>>> don't see a simplification for the proposed (and necessary)
>> query
>>>>>>>>>> type?
>>>>>>>>>>
>>>>>>>>>> On the other hand, I see an advantage of a single-ts-range-key
>> for
>>>>>>>>>> querying over the "latest version" with a range of keys. For a
>>>>>>>>>> single-ts-range-key query, this it would be the default
>> (similar to
>>>>>>>>>> VersionedKeyQuery with not asOf-timestamped defined).
>>>>>>>>>>
>>>>>>>>>> In the current version of the KIP, (if we agree that default
>> should
>>>>>>>>>> actually return "all versions" not "latest" -- this default was
>>>>>>>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
>>>>>>>>>> need to
>>>>>>>>>> have the same default here to stay consistent), users would
>> need to
>>>>>>>>>> pass
>>>>>>>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query
>> the
>>>>>>>>>> latest point in time only, what seems to be clumsy? Or we could
>> add
>>>>> a
>>>>>>>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it
>> does
>>>>>>>>>> seems
>>>>>>>>>> a little clumsy, too.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The overall order of the returned records is by Key
>>>>>>>>>>
>>>>>>>>>> I assume, results are returned by timestamp for each key. The
>> KIP
>>>>>>>>>> should
>>>>>>>>>> be explicit about it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> To be very explicit, should we rename the methods to specify
>> the key
>>>>>>>> bound?
>>>>>>>>>>
>>>>>>>>>>      - withRange -> withKeyRange
>>>>>>>>>>      - withLowerBound -> withLowerKeyBound
>>>>>>>>>>      - withUpperBound -> withUpperKeyBound
>>>>>>>>>>      - withNoBounds -> allKeys (or withNoKeyBounds, but we use
>>>>>>>>>> `allVersions` and not `noTimeBound` and should align the
>> naming?)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
>>>>>>>>>>> Hi Alieh,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>>
>>>>>>>>>>> One high level comment/question:
>>>>>>>>>>>
>>>>>>>>>>> I assume you separated single key queries into two classes
>> because
>>>>>>>>>>> versioned key queries return a single value and multi version
>> key
>>>>>>>>>>> queries return iterators. Although, range queries always return
>>>>>>>>>>> iterators, it would make sense to also separate range queries
>> for
>>>>>>>>>>> versioned state stores into range queries that return one
>> single
>>>>>>>> version
>>>>>>>>>>> of the keys within a range and range queries that return
>> multiple
>>>>>>>>>>> version of the keys within a range, IMO. That would reduce the
>>>>>>>>>>> meaningless combinations.
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Bruno
>>>>>>>>>>>
>>>>>>>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I splitted KIP-960
>>>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>
>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>>>>>>>
>>>>>>>>>>>> into
>>>>>>>>>>>> three separate KIPs. Therefore, please continue discussions
>> about
>>>>>>>> range
>>>>>>>>>>>> interactive queries here. You can see all the addressed
>> reviews
>>>>>>>>>>>> on the
>>>>>>>>>>>> following page. Thanks in advance.
>>>>>>>>>>>>
>>>>>>>>>>>> KIP-969: Support range interactive queries (IQv2) for
>> versioned
>>>>>>>>>>>> state
>>>>>>>>>>>> stores
>>>>>>>>>>>> <
>>>>>>>>>>
>>>>>>>>
>>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I look forward to your feedback!
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Alieh
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>
> 

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Alieh Saeedi <as...@confluent.io.INVALID>.
Thank you all.
I decided to remove the ordering from the KIP and maybe move it to the
subsequent KIPs (based on user demand).
I skimmed over the discussion thread, but we still had an open question
about how a user can retrieve the `latest()` values. I think what Matthias
suggested (using `TimestampedRangeQuery`) can be the solution. What do you
think? Bests,
Alieh

On Wed, Dec 6, 2023 at 1:57 PM Lucas Brutschy
<lb...@confluent.io.invalid> wrote:

> Hi Alieh,
>
> I think we do not have to restrict ourselves too much for the future
> and complicate the implementation. The user can always store away and
> sort, so we should only provide the ordering guarantee we can provide
> efficiently, and we shouldn't restrict our future evolution too much
> by this. I think a global ordering by timestamp is sufficient for this
> KIP, so I vote for option 2.
>
> Cheers,
> Lucas
>
> On Fri, Dec 1, 2023 at 8:45 PM Alieh Saeedi
> <as...@confluent.io.invalid> wrote:
> >
> > Hi all,
> > I updated the KIP based on the changes made in the former KIP (KIP-968).
> So
> > with the `ResultOrder` enum, the class `MultiVersionedRangeQuery` had
> some
> > changes both in the defined fields and defined methods.
> >
> > Based on the PoC PR, what we currently promise in the KIP about ordering
> > seems like a dream. I intended to enable the user to have a global
> ordering
> > based on the key or timestamp (using `orderByKey()` or
> > `orderByTimestamp()`) and then even have a partial ordering based on the
> > other parameter.  For example, if the user specifies
> > `orderByKey().withDescendingKey().withAscendingTimestamps()`, then the
> > global ordering is based on keys in a descending order, and then all the
> > records with the same key are ordered ascendingly based on ts. The result
> > will be something like (k3,v1,t1), (k3,v2,t2), (k2,v2,t3), (k1.v1.t1)
> > (assuming that k1<k2<k3 and t1<t2<t3).
> >
> > But in reality, we have limitations for having a global ordering based on
> > keys since we are iterating over the segments in a lazy manner.
> Therefore,
> > when we are processing the current segment, we have no knowledge of the
> > keys in the next segment.
> >
> > Now I have two suggestions:
> > 1. Changing the `MultiVersionedRangeQuery` class as follows:
> >
> > private final ResultOrder *segmentOrder*;
> > private final contentOrder *segmentContentOrder*; // can be KEY_WISE or
> > TIMESTAMP_WISE
> > private final ResultOrder  *keyOrder*;
> > private final ResultOrder *timestampOrder*;
> >
> > This way, the global ordering is specified by the `segmentOrder`. It
> means
> > we either show the results from the oldest to the latest segment
> > (ASCENDING) or from the latest to the oldest segment (DESCENDING).
> > Then, inside each segment, we guarantee a `segmentContentOrder` which can
> > be `KEY_WISE` or `TIMESTAMP_WISE`. The key order and timestamp order are
> > specified by the `keyOrder` and `timestampOrder` properties,
> respectively.
> > If the content of a segment must be ordered key-wise and then we have two
> > records with the same key (it happens in older segments), then the
> > `timestampOrder` determines the order between them.
> >
> > 2. We define that global ordering can only be based on timestamps (the
> > `timestampOrder` property), and if two records have the same timestamp,
> the
> > `keyOrder` determines the order between them.
> >
> > I think the first suggestion gives more flexibility to the user, but it
> is
> > more complicated. I mean, it needs good Javadocs.
> >
> > I look forward to your ideas.
> >
> > Cheers,
> > Alieh
> >
> >
> > On Mon, Nov 6, 2023 at 3:08 PM Alieh Saeedi <as...@confluent.io>
> wrote:
> >
> > > Thank you, Bruno and Matthias.
> > > I updated the KIP as follows:
> > > 1. The one remaining `asOf` word in the KIP is removed.
> > > 2. Example 2 is updated. Thanks, Bruno for the correction.
> > >
> > > Discussions and open questions
> > > 1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as well.
> > > Because the results must have a global ordering. Either based on key or
> > > based on ts. For example, we can have
> > > `orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
> > > global ordering is based on keys in a descending order, and then all
> the
> > > records with the same key are ordered ascendingly based on ts. The
> result
> > > will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
> > > (k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
> > > 2. About having the `latest()` method: it seems like we are undecided
> yet.
> > > Adding a new class or ignoring `latest()` for VersionedRangeQuery and
> > > instead using the `TimestampedRangeQuery` as Matthias suggested.
> > >
> > > Cheers,
> > > Alieh
> > >
> > > On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> > >
> > >> Great discussion. Seems we are making good progress.
> > >>
> > >> I see advantages and disadvantages in splitting out a "single-ts
> > >> key-range" query type. I guess, the main question might be, what
> > >> use-cases do we anticipate and how common we believe they are?
> > >>
> > >> We should also take KIP-992 (adding `TimestampedRangeQuery`) into
> account.
> > >>
> > >> (1) The most common use case seems to be, a key-range over latest. For
> > >> this one, we could use `TimestampedRangeQuery` -- it would return a
> > >> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't
> > >> be any information loss, because "toTimestamp" would be `null` anyway.
> > >>
> > >>
> > >> (2) The open question is, how common is a key-range in a point in the
> > >> past? For this case, using
> > >> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
> > >> actually not to be a bad UX, and also does not really need to be
> > >> explained how to do this (compared to "latest" that required to pass
> in
> > >> MAX_INT).
> > >>
> > >>
> > >> If we add a new query type, we avoid both issues (are they actually
> > >> issues?) and add some nice syntactic sugar to the API. The question
> is,
> > >> if it's worth the effort and expanded API surface area?
> > >>
> > >> To summarize:
> > >>
> > >> Add new query type:
> > >>
> > >> > // queries latest; returns VersionedRecords
> > >> > VersionedRangeQuery.withKeyRange(...);
> > >> >
> > >> > VersionedRangeQuery.withKeyRange(...).asOf(ts);
> > >>
> > >> vs
> > >>
> > >> No new query type:
> > >>
> > >> > // queries latest; returns ValueAndTimestamps
> > >> > TimestampedRangeQuery.withRange(...);
> > >> >
> > >> > MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
> > >>
> > >>
> > >>
> > >> I guess, bottom line, I would be ok with either one and I am actually
> > >> not even sure which one I prefer personally. Just wanted to lay out
> the
> > >> tradeoffs I see. Not sure if three are other considerations that would
> > >> tip the scale into either direction?
> > >>
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
> > >> > Hi Alieh,
> > >> >
> > >> > I like the examples!
> > >> >
> > >> >
> > >> > 1.
> > >> > Some terms like `asOf` in the descriptions still need to be
> replaced in
> > >> > the KIP.
> > >> >
> > >> >
> > >> > 2.
> > >> > In your last e-mail you state:
> > >> >
> > >> > "How can a user retrieve the latest value? We have the same issue
> with
> > >> > kip-968 as well."
> > >> >
> > >> > Why do we have the same issue in KIP-968?
> > >> > If users need to retrieve the latest value for a specific key, they
> > >> > should use KIP-960.
> > >> >
> > >> >
> > >> > 3.
> > >> > Regarding querying the latest version (or an asOf version) of
> records
> > >> in
> > >> > a given key range, that is exactly why I proposed to split the query
> > >> > class. One class would return the latest and the asOf versions
> (i.e. a
> > >> > single version) of records in a key range and the other class would
> > >> > return all versions in a given time range (i.e. multiple versions)
> of
> > >> > the records in a given key range. The splitting in two classes
> avoids
> > >> to
> > >> > specify a time range and latest or a time range and asOf on a given
> key
> > >> > range.
> > >> >
> > >> > Alternatively, you could keep one class and you could specify that
> the
> > >> > last call wins as you specified for fromTime() and toTime(). For
> > >> > example, for
> > >> >
> > >> >
> > >>
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
> > >> >
> > >> > latest() wins. However, how would you interpret
> > >> >
> > >> >
> > >>
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
> > >> >
> > >> > Is it [t1, t2] or [-INF, t2]?
> > >> > (I would say the latter, but somebody else would say differently)
> > >> >
> > >> > The two class solution seems cleaner to me since we do not need to
> > >> > consider those edge cases.
> > >> > You could propose both classes in this KIP.
> > >> >
> > >> >
> > >> > 4.
> > >> > Why do we need orderByKey() and orderByTimestamp()?
> > >> > Aren't withAscendingKeys(), withDescendingKeys(),
> > >> > withAscendingTimestamps(), and withDescendingTimestamps()
> sufficient?
> > >> >
> > >> >
> > >> > 5.
> > >> > In example 2, why is
> > >> >
> > >> > key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
> > >> > 2023-01-25T10:00:00.00Z
> > >> >
> > >> > not part of the result?
> > >> > It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
> > >> > which overlaps with the time range [2023-01-17T10:00:00.00Z,
> > >> > 2023-01-30T10:00:00.00Z] of the query.
> > >> >
> > >> > (BTW, in the second example, you forgot to add "key" to the output.)
> > >> >
> > >> >
> > >> > Best,
> > >> > Bruno
> > >> >
> > >> >
> > >> > On 10/25/23 4:01 PM, Alieh Saeedi wrote:
> > >> >> Thanks, Matthias and Bruno.
> > >> >> Here is a list of updates:
> > >> >>
> > >> >>     1. I changed the variable and method names as I did for
> KIP-968, as
> > >> >>     follows:
> > >> >>        - "fromTimestamp" -> fromTime
> > >> >>        - "asOfTimestamp"-> toTime
> > >> >>        - "from(instant)" -> fromTime(instant)"
> > >> >>        - asOf(instant)"->toTime(instant)
> > >> >>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
> > >> >>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods
> for
> > >> >> user
> > >> >>     readability.
> > >> >>     3. I updated the "Example" section as well.
> > >> >>
> > >> >> Some points:
> > >> >>
> > >> >>     1. Even though the kip suggests adding the `get(k
> lowerkeybound, k
> > >> >>     upperkeybound, long fromtime, long totime)` method to the
> > >> >> interface, I
> > >> >>     added this method to the `rocksdbversionedstore` class for now.
> > >> >>     2. Matthias, you mentioned a very important point. How can a
> user
> > >> >>     retrieve the latest value? We have the same issue with kip-968
> as
> > >> >> well.
> > >> >>     Asking a user to call `toTime(max)` violates the API design
> rules,
> > >> >> as you
> > >> >>     mentioned. So I think we must have a `latest()` method for both
> > >> >> KIP-968 and
> > >> >>     KIP-969. What do you think about that?
> > >> >>
> > >> >>
> > >> >> Cheers,
> > >> >> Alieh
> > >> >>
> > >> >> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
> > >> wrote:
> > >> >>
> > >> >>> Thanks for the update.
> > >> >>>
> > >> >>>
> > >> >>>> To retrieve
> > >> >>>>>      the latest value(s), the user must call just the asOf
> method
> > >> with
> > >> >>> the MAX
> > >> >>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you
> think
> > >> >>>>> it is
> > >> >>> clumsy,
> > >> >>>>>      Matthias?
> > >> >>>
> > >> >>>
> > >> >>> Well, in KIP-968 calling `asOf` and passing in a timestamp is
> > >> optional,
> > >> >>> and default is "latest", right? So while `asOf(MAX)` does the same
> > >> >>> thing, practically users would never call `asOf` for a "latest"
> query?
> > >> >>>
> > >> >>> In this KIP, we enforce that users give us a key range (we have
> the 4
> > >> >>> static entry point methods to define a query for this), and we
> say we
> > >> >>> default to "no bounds" for time range by default.
> > >> >>>
> > >> >>> The existing `RangeQuery` allows to query a range of keys for
> existing
> > >> >>> stores. It seems to be a common pattern to query a key-range on
> > >> latest.
> > >> >>> -- in the current proposal, users would need to do:
> > >> >>>
> > >> >>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
> > >> >>>
> > >> >>> Would like to hear from others if we think that's good user
> > >> experience?
> > >> >>> If we agree to accept this, I think we should explain how to do
> this
> > >> in
> > >> >>> the JavaDocs (and also regular docs... --- otherwise, I can
> already
> > >> >>> anticipate user question on all question-asking-channels how to
> do a
> > >> >>> "normal key range query". IMHO, the problem is not that the code
> > >> itself
> > >> >>> it too clumsy, but that it's totally not obvious to uses how to
> > >> express
> > >> >>> it without actually explaining it to them. It basically violated
> the
> > >> API
> > >> >>> design rule "make it easy to use / simple things should be easy".
> > >> >>>
> > >> >>> Btw: We could also re-use `RangeQuery` and add am implementation
> to
> > >> >>> `VersionedStateStore` to just accept this query type, with "key
> range
> > >> >>> over latest" semantics. -- The issue is of course, that uses need
> to
> > >> >>> know that the query would return `ValueAndTimestamp` and not
> plain `V`
> > >> >>> (or we add a translation step to unwrap the value, but we would
> lose
> > >> the
> > >> >>> "validFrom" timestamp -- validTo would be `null`). Because type
> safety
> > >> >>> is a general issue in IQv2 it would not make it worse (in the
> strict
> > >> >>> sense), but I am also not sure if we want to dig an even deeper
> > >> hole...
> > >> >>>
> > >> >>>
> > >> >>> -Matthias
> > >> >>>
> > >> >>>
> > >> >>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
> > >> >>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is
> a
> > >> >>> summary
> > >> >>>> of the updates I made to the KIP:
> > >> >>>>
> > >> >>>>      1.  I liked the idea of renaming methods as Matthias
> suggested.
> > >> >>>>      2. I removed the allversions() method as I did in KIP-968.
> To
> > >> >>> retrieve
> > >> >>>>      the latest value(s), the user must call just the asOf method
> > >> with
> > >> >>> the MAX
> > >> >>>>      value (asOf(MAX)). The same applies to KIP-968. Do you
> think it
> > >> is
> > >> >>> clumsy,
> > >> >>>>      Matthias?
> > >> >>>>      3. I added a method to the *VersionedKeyValueStore
> *interface,
> > >> >>>> as I
> > >> >>> did
> > >> >>>>      for KIP-968.
> > >> >>>>      4. Matthias: I do not get what you mean by your second
> comment.
> > >> >>>> Isn't
> > >> >>>>      the KIP already explicit about that?
> > >> >>>>
> > >> >>>>      > I assume, results are returned by timestamp for each key.
> The
> > >> >>>> KIP
> > >> >>>>      should be explicit about it.
> > >> >>>>
> > >> >>>>
> > >> >>>> Cheers,
> > >> >>>> Alieh
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mjsax@apache.org
> >
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>> Thanks for updating the KIP.
> > >> >>>>>
> > >> >>>>> Not sure if I agree or not with Bruno's idea to split the query
> > >> types
> > >> >>>>> further? In the end, we split them only because there is three
> > >> >>>>> different
> > >> >>>>> return types: single value, value-iterator, key-value-iterator.
> > >> >>>>>
> > >> >>>>> What do we gain by splitting out single-ts-range-key? In the
> end,
> > >> for
> > >> >>>>> range-ts-range-key the proposed class is necessary and is a
> superset
> > >> >>>>> (one can set both timestamps to the same value, for single-ts
> > >> lookup).
> > >> >>>>>
> > >> >>>>> The mentioned simplification might apply to
> "single-ts-range-key"
> > >> >>>>> but I
> > >> >>>>> don't see a simplification for the proposed (and necessary)
> query
> > >> >>>>> type?
> > >> >>>>>
> > >> >>>>> On the other hand, I see an advantage of a single-ts-range-key
> for
> > >> >>>>> querying over the "latest version" with a range of keys. For a
> > >> >>>>> single-ts-range-key query, this it would be the default
> (similar to
> > >> >>>>> VersionedKeyQuery with not asOf-timestamped defined).
> > >> >>>>>
> > >> >>>>> In the current version of the KIP, (if we agree that default
> should
> > >> >>>>> actually return "all versions" not "latest" -- this default was
> > >> >>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
> > >> >>>>> need to
> > >> >>>>> have the same default here to stay consistent), users would
> need to
> > >> >>>>> pass
> > >> >>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query
> the
> > >> >>>>> latest point in time only, what seems to be clumsy? Or we could
> add
> > >> a
> > >> >>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it
> does
> > >> >>>>> seems
> > >> >>>>> a little clumsy, too.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>> The overall order of the returned records is by Key
> > >> >>>>>
> > >> >>>>> I assume, results are returned by timestamp for each key. The
> KIP
> > >> >>>>> should
> > >> >>>>> be explicit about it.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> To be very explicit, should we rename the methods to specify
> the key
> > >> >>> bound?
> > >> >>>>>
> > >> >>>>>     - withRange -> withKeyRange
> > >> >>>>>     - withLowerBound -> withLowerKeyBound
> > >> >>>>>     - withUpperBound -> withUpperKeyBound
> > >> >>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
> > >> >>>>> `allVersions` and not `noTimeBound` and should align the
> naming?)
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> -Matthias
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
> > >> >>>>>> Hi Alieh,
> > >> >>>>>>
> > >> >>>>>> Thanks for the KIP!
> > >> >>>>>>
> > >> >>>>>> One high level comment/question:
> > >> >>>>>>
> > >> >>>>>> I assume you separated single key queries into two classes
> because
> > >> >>>>>> versioned key queries return a single value and multi version
> key
> > >> >>>>>> queries return iterators. Although, range queries always return
> > >> >>>>>> iterators, it would make sense to also separate range queries
> for
> > >> >>>>>> versioned state stores into range queries that return one
> single
> > >> >>> version
> > >> >>>>>> of the keys within a range and range queries that return
> multiple
> > >> >>>>>> version of the keys within a range, IMO. That would reduce the
> > >> >>>>>> meaningless combinations.
> > >> >>>>>> WDYT?
> > >> >>>>>>
> > >> >>>>>> Best,
> > >> >>>>>> Bruno
> > >> >>>>>>
> > >> >>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
> > >> >>>>>>> Hi all,
> > >> >>>>>>>
> > >> >>>>>>> I splitted KIP-960
> > >> >>>>>>> <
> > >> >>>>>
> > >> >>>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
> > >> >>>>>>
> > >> >>>>>>> into
> > >> >>>>>>> three separate KIPs. Therefore, please continue discussions
> about
> > >> >>> range
> > >> >>>>>>> interactive queries here. You can see all the addressed
> reviews
> > >> >>>>>>> on the
> > >> >>>>>>> following page. Thanks in advance.
> > >> >>>>>>>
> > >> >>>>>>> KIP-969: Support range interactive queries (IQv2) for
> versioned
> > >> >>>>>>> state
> > >> >>>>>>> stores
> > >> >>>>>>> <
> > >> >>>>>
> > >> >>>
> > >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
> > >> >>>>>>
> > >> >>>>>>>
> > >> >>>>>>> I look forward to your feedback!
> > >> >>>>>>>
> > >> >>>>>>> Cheers,
> > >> >>>>>>> Alieh
> > >> >>>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >>
> > >
>

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Lucas Brutschy <lb...@confluent.io.INVALID>.
Hi Alieh,

I think we do not have to restrict ourselves too much for the future
and complicate the implementation. The user can always store away and
sort, so we should only provide the ordering guarantee we can provide
efficiently, and we shouldn't restrict our future evolution too much
by this. I think a global ordering by timestamp is sufficient for this
KIP, so I vote for option 2.

Cheers,
Lucas

On Fri, Dec 1, 2023 at 8:45 PM Alieh Saeedi
<as...@confluent.io.invalid> wrote:
>
> Hi all,
> I updated the KIP based on the changes made in the former KIP (KIP-968). So
> with the `ResultOrder` enum, the class `MultiVersionedRangeQuery` had some
> changes both in the defined fields and defined methods.
>
> Based on the PoC PR, what we currently promise in the KIP about ordering
> seems like a dream. I intended to enable the user to have a global ordering
> based on the key or timestamp (using `orderByKey()` or
> `orderByTimestamp()`) and then even have a partial ordering based on the
> other parameter.  For example, if the user specifies
> `orderByKey().withDescendingKey().withAscendingTimestamps()`, then the
> global ordering is based on keys in a descending order, and then all the
> records with the same key are ordered ascendingly based on ts. The result
> will be something like (k3,v1,t1), (k3,v2,t2), (k2,v2,t3), (k1.v1.t1)
> (assuming that k1<k2<k3 and t1<t2<t3).
>
> But in reality, we have limitations for having a global ordering based on
> keys since we are iterating over the segments in a lazy manner. Therefore,
> when we are processing the current segment, we have no knowledge of the
> keys in the next segment.
>
> Now I have two suggestions:
> 1. Changing the `MultiVersionedRangeQuery` class as follows:
>
> private final ResultOrder *segmentOrder*;
> private final contentOrder *segmentContentOrder*; // can be KEY_WISE or
> TIMESTAMP_WISE
> private final ResultOrder  *keyOrder*;
> private final ResultOrder *timestampOrder*;
>
> This way, the global ordering is specified by the `segmentOrder`. It means
> we either show the results from the oldest to the latest segment
> (ASCENDING) or from the latest to the oldest segment (DESCENDING).
> Then, inside each segment, we guarantee a `segmentContentOrder` which can
> be `KEY_WISE` or `TIMESTAMP_WISE`. The key order and timestamp order are
> specified by the `keyOrder` and `timestampOrder` properties, respectively.
> If the content of a segment must be ordered key-wise and then we have two
> records with the same key (it happens in older segments), then the
> `timestampOrder` determines the order between them.
>
> 2. We define that global ordering can only be based on timestamps (the
> `timestampOrder` property), and if two records have the same timestamp, the
> `keyOrder` determines the order between them.
>
> I think the first suggestion gives more flexibility to the user, but it is
> more complicated. I mean, it needs good Javadocs.
>
> I look forward to your ideas.
>
> Cheers,
> Alieh
>
>
> On Mon, Nov 6, 2023 at 3:08 PM Alieh Saeedi <as...@confluent.io> wrote:
>
> > Thank you, Bruno and Matthias.
> > I updated the KIP as follows:
> > 1. The one remaining `asOf` word in the KIP is removed.
> > 2. Example 2 is updated. Thanks, Bruno for the correction.
> >
> > Discussions and open questions
> > 1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as well.
> > Because the results must have a global ordering. Either based on key or
> > based on ts. For example, we can have
> > `orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
> > global ordering is based on keys in a descending order, and then all the
> > records with the same key are ordered ascendingly based on ts. The result
> > will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
> > (k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
> > 2. About having the `latest()` method: it seems like we are undecided yet.
> > Adding a new class or ignoring `latest()` for VersionedRangeQuery and
> > instead using the `TimestampedRangeQuery` as Matthias suggested.
> >
> > Cheers,
> > Alieh
> >
> > On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org> wrote:
> >
> >> Great discussion. Seems we are making good progress.
> >>
> >> I see advantages and disadvantages in splitting out a "single-ts
> >> key-range" query type. I guess, the main question might be, what
> >> use-cases do we anticipate and how common we believe they are?
> >>
> >> We should also take KIP-992 (adding `TimestampedRangeQuery`) into account.
> >>
> >> (1) The most common use case seems to be, a key-range over latest. For
> >> this one, we could use `TimestampedRangeQuery` -- it would return a
> >> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't
> >> be any information loss, because "toTimestamp" would be `null` anyway.
> >>
> >>
> >> (2) The open question is, how common is a key-range in a point in the
> >> past? For this case, using
> >> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
> >> actually not to be a bad UX, and also does not really need to be
> >> explained how to do this (compared to "latest" that required to pass in
> >> MAX_INT).
> >>
> >>
> >> If we add a new query type, we avoid both issues (are they actually
> >> issues?) and add some nice syntactic sugar to the API. The question is,
> >> if it's worth the effort and expanded API surface area?
> >>
> >> To summarize:
> >>
> >> Add new query type:
> >>
> >> > // queries latest; returns VersionedRecords
> >> > VersionedRangeQuery.withKeyRange(...);
> >> >
> >> > VersionedRangeQuery.withKeyRange(...).asOf(ts);
> >>
> >> vs
> >>
> >> No new query type:
> >>
> >> > // queries latest; returns ValueAndTimestamps
> >> > TimestampedRangeQuery.withRange(...);
> >> >
> >> > MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
> >>
> >>
> >>
> >> I guess, bottom line, I would be ok with either one and I am actually
> >> not even sure which one I prefer personally. Just wanted to lay out the
> >> tradeoffs I see. Not sure if three are other considerations that would
> >> tip the scale into either direction?
> >>
> >>
> >>
> >> -Matthias
> >>
> >> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
> >> > Hi Alieh,
> >> >
> >> > I like the examples!
> >> >
> >> >
> >> > 1.
> >> > Some terms like `asOf` in the descriptions still need to be replaced in
> >> > the KIP.
> >> >
> >> >
> >> > 2.
> >> > In your last e-mail you state:
> >> >
> >> > "How can a user retrieve the latest value? We have the same issue with
> >> > kip-968 as well."
> >> >
> >> > Why do we have the same issue in KIP-968?
> >> > If users need to retrieve the latest value for a specific key, they
> >> > should use KIP-960.
> >> >
> >> >
> >> > 3.
> >> > Regarding querying the latest version (or an asOf version) of records
> >> in
> >> > a given key range, that is exactly why I proposed to split the query
> >> > class. One class would return the latest and the asOf versions (i.e. a
> >> > single version) of records in a key range and the other class would
> >> > return all versions in a given time range (i.e. multiple versions) of
> >> > the records in a given key range. The splitting in two classes avoids
> >> to
> >> > specify a time range and latest or a time range and asOf on a given key
> >> > range.
> >> >
> >> > Alternatively, you could keep one class and you could specify that the
> >> > last call wins as you specified for fromTime() and toTime(). For
> >> > example, for
> >> >
> >> >
> >> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
> >> >
> >> > latest() wins. However, how would you interpret
> >> >
> >> >
> >> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
> >> >
> >> > Is it [t1, t2] or [-INF, t2]?
> >> > (I would say the latter, but somebody else would say differently)
> >> >
> >> > The two class solution seems cleaner to me since we do not need to
> >> > consider those edge cases.
> >> > You could propose both classes in this KIP.
> >> >
> >> >
> >> > 4.
> >> > Why do we need orderByKey() and orderByTimestamp()?
> >> > Aren't withAscendingKeys(), withDescendingKeys(),
> >> > withAscendingTimestamps(), and withDescendingTimestamps() sufficient?
> >> >
> >> >
> >> > 5.
> >> > In example 2, why is
> >> >
> >> > key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
> >> > 2023-01-25T10:00:00.00Z
> >> >
> >> > not part of the result?
> >> > It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
> >> > which overlaps with the time range [2023-01-17T10:00:00.00Z,
> >> > 2023-01-30T10:00:00.00Z] of the query.
> >> >
> >> > (BTW, in the second example, you forgot to add "key" to the output.)
> >> >
> >> >
> >> > Best,
> >> > Bruno
> >> >
> >> >
> >> > On 10/25/23 4:01 PM, Alieh Saeedi wrote:
> >> >> Thanks, Matthias and Bruno.
> >> >> Here is a list of updates:
> >> >>
> >> >>     1. I changed the variable and method names as I did for KIP-968, as
> >> >>     follows:
> >> >>        - "fromTimestamp" -> fromTime
> >> >>        - "asOfTimestamp"-> toTime
> >> >>        - "from(instant)" -> fromTime(instant)"
> >> >>        - asOf(instant)"->toTime(instant)
> >> >>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
> >> >>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods for
> >> >> user
> >> >>     readability.
> >> >>     3. I updated the "Example" section as well.
> >> >>
> >> >> Some points:
> >> >>
> >> >>     1. Even though the kip suggests adding the `get(k lowerkeybound, k
> >> >>     upperkeybound, long fromtime, long totime)` method to the
> >> >> interface, I
> >> >>     added this method to the `rocksdbversionedstore` class for now.
> >> >>     2. Matthias, you mentioned a very important point. How can a user
> >> >>     retrieve the latest value? We have the same issue with kip-968 as
> >> >> well.
> >> >>     Asking a user to call `toTime(max)` violates the API design rules,
> >> >> as you
> >> >>     mentioned. So I think we must have a `latest()` method for both
> >> >> KIP-968 and
> >> >>     KIP-969. What do you think about that?
> >> >>
> >> >>
> >> >> Cheers,
> >> >> Alieh
> >> >>
> >> >> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
> >> wrote:
> >> >>
> >> >>> Thanks for the update.
> >> >>>
> >> >>>
> >> >>>> To retrieve
> >> >>>>>      the latest value(s), the user must call just the asOf method
> >> with
> >> >>> the MAX
> >> >>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think
> >> >>>>> it is
> >> >>> clumsy,
> >> >>>>>      Matthias?
> >> >>>
> >> >>>
> >> >>> Well, in KIP-968 calling `asOf` and passing in a timestamp is
> >> optional,
> >> >>> and default is "latest", right? So while `asOf(MAX)` does the same
> >> >>> thing, practically users would never call `asOf` for a "latest" query?
> >> >>>
> >> >>> In this KIP, we enforce that users give us a key range (we have the 4
> >> >>> static entry point methods to define a query for this), and we say we
> >> >>> default to "no bounds" for time range by default.
> >> >>>
> >> >>> The existing `RangeQuery` allows to query a range of keys for existing
> >> >>> stores. It seems to be a common pattern to query a key-range on
> >> latest.
> >> >>> -- in the current proposal, users would need to do:
> >> >>>
> >> >>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
> >> >>>
> >> >>> Would like to hear from others if we think that's good user
> >> experience?
> >> >>> If we agree to accept this, I think we should explain how to do this
> >> in
> >> >>> the JavaDocs (and also regular docs... --- otherwise, I can already
> >> >>> anticipate user question on all question-asking-channels how to do a
> >> >>> "normal key range query". IMHO, the problem is not that the code
> >> itself
> >> >>> it too clumsy, but that it's totally not obvious to uses how to
> >> express
> >> >>> it without actually explaining it to them. It basically violated the
> >> API
> >> >>> design rule "make it easy to use / simple things should be easy".
> >> >>>
> >> >>> Btw: We could also re-use `RangeQuery` and add am implementation to
> >> >>> `VersionedStateStore` to just accept this query type, with "key range
> >> >>> over latest" semantics. -- The issue is of course, that uses need to
> >> >>> know that the query would return `ValueAndTimestamp` and not plain `V`
> >> >>> (or we add a translation step to unwrap the value, but we would lose
> >> the
> >> >>> "validFrom" timestamp -- validTo would be `null`). Because type safety
> >> >>> is a general issue in IQv2 it would not make it worse (in the strict
> >> >>> sense), but I am also not sure if we want to dig an even deeper
> >> hole...
> >> >>>
> >> >>>
> >> >>> -Matthias
> >> >>>
> >> >>>
> >> >>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
> >> >>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
> >> >>> summary
> >> >>>> of the updates I made to the KIP:
> >> >>>>
> >> >>>>      1.  I liked the idea of renaming methods as Matthias suggested.
> >> >>>>      2. I removed the allversions() method as I did in KIP-968. To
> >> >>> retrieve
> >> >>>>      the latest value(s), the user must call just the asOf method
> >> with
> >> >>> the MAX
> >> >>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think it
> >> is
> >> >>> clumsy,
> >> >>>>      Matthias?
> >> >>>>      3. I added a method to the *VersionedKeyValueStore *interface,
> >> >>>> as I
> >> >>> did
> >> >>>>      for KIP-968.
> >> >>>>      4. Matthias: I do not get what you mean by your second comment.
> >> >>>> Isn't
> >> >>>>      the KIP already explicit about that?
> >> >>>>
> >> >>>>      > I assume, results are returned by timestamp for each key. The
> >> >>>> KIP
> >> >>>>      should be explicit about it.
> >> >>>>
> >> >>>>
> >> >>>> Cheers,
> >> >>>> Alieh
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Thanks for updating the KIP.
> >> >>>>>
> >> >>>>> Not sure if I agree or not with Bruno's idea to split the query
> >> types
> >> >>>>> further? In the end, we split them only because there is three
> >> >>>>> different
> >> >>>>> return types: single value, value-iterator, key-value-iterator.
> >> >>>>>
> >> >>>>> What do we gain by splitting out single-ts-range-key? In the end,
> >> for
> >> >>>>> range-ts-range-key the proposed class is necessary and is a superset
> >> >>>>> (one can set both timestamps to the same value, for single-ts
> >> lookup).
> >> >>>>>
> >> >>>>> The mentioned simplification might apply to "single-ts-range-key"
> >> >>>>> but I
> >> >>>>> don't see a simplification for the proposed (and necessary) query
> >> >>>>> type?
> >> >>>>>
> >> >>>>> On the other hand, I see an advantage of a single-ts-range-key for
> >> >>>>> querying over the "latest version" with a range of keys. For a
> >> >>>>> single-ts-range-key query, this it would be the default (similar to
> >> >>>>> VersionedKeyQuery with not asOf-timestamped defined).
> >> >>>>>
> >> >>>>> In the current version of the KIP, (if we agree that default should
> >> >>>>> actually return "all versions" not "latest" -- this default was
> >> >>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
> >> >>>>> need to
> >> >>>>> have the same default here to stay consistent), users would need to
> >> >>>>> pass
> >> >>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
> >> >>>>> latest point in time only, what seems to be clumsy? Or we could add
> >> a
> >> >>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does
> >> >>>>> seems
> >> >>>>> a little clumsy, too.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>>> The overall order of the returned records is by Key
> >> >>>>>
> >> >>>>> I assume, results are returned by timestamp for each key. The KIP
> >> >>>>> should
> >> >>>>> be explicit about it.
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> To be very explicit, should we rename the methods to specify the key
> >> >>> bound?
> >> >>>>>
> >> >>>>>     - withRange -> withKeyRange
> >> >>>>>     - withLowerBound -> withLowerKeyBound
> >> >>>>>     - withUpperBound -> withUpperKeyBound
> >> >>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
> >> >>>>> `allVersions` and not `noTimeBound` and should align the naming?)
> >> >>>>>
> >> >>>>>
> >> >>>>>
> >> >>>>> -Matthias
> >> >>>>>
> >> >>>>>
> >> >>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
> >> >>>>>> Hi Alieh,
> >> >>>>>>
> >> >>>>>> Thanks for the KIP!
> >> >>>>>>
> >> >>>>>> One high level comment/question:
> >> >>>>>>
> >> >>>>>> I assume you separated single key queries into two classes because
> >> >>>>>> versioned key queries return a single value and multi version key
> >> >>>>>> queries return iterators. Although, range queries always return
> >> >>>>>> iterators, it would make sense to also separate range queries for
> >> >>>>>> versioned state stores into range queries that return one single
> >> >>> version
> >> >>>>>> of the keys within a range and range queries that return multiple
> >> >>>>>> version of the keys within a range, IMO. That would reduce the
> >> >>>>>> meaningless combinations.
> >> >>>>>> WDYT?
> >> >>>>>>
> >> >>>>>> Best,
> >> >>>>>> Bruno
> >> >>>>>>
> >> >>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
> >> >>>>>>> Hi all,
> >> >>>>>>>
> >> >>>>>>> I splitted KIP-960
> >> >>>>>>> <
> >> >>>>>
> >> >>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >> >>>>>>
> >> >>>>>>> into
> >> >>>>>>> three separate KIPs. Therefore, please continue discussions about
> >> >>> range
> >> >>>>>>> interactive queries here. You can see all the addressed reviews
> >> >>>>>>> on the
> >> >>>>>>> following page. Thanks in advance.
> >> >>>>>>>
> >> >>>>>>> KIP-969: Support range interactive queries (IQv2) for versioned
> >> >>>>>>> state
> >> >>>>>>> stores
> >> >>>>>>> <
> >> >>>>>
> >> >>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >> >>>>>>
> >> >>>>>>>
> >> >>>>>>> I look forward to your feedback!
> >> >>>>>>>
> >> >>>>>>> Cheers,
> >> >>>>>>> Alieh
> >> >>>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Alieh Saeedi <as...@confluent.io.INVALID>.
Hi all,
I updated the KIP based on the changes made in the former KIP (KIP-968). So
with the `ResultOrder` enum, the class `MultiVersionedRangeQuery` had some
changes both in the defined fields and defined methods.

Based on the PoC PR, what we currently promise in the KIP about ordering
seems like a dream. I intended to enable the user to have a global ordering
based on the key or timestamp (using `orderByKey()` or
`orderByTimestamp()`) and then even have a partial ordering based on the
other parameter.  For example, if the user specifies
`orderByKey().withDescendingKey().withAscendingTimestamps()`, then the
global ordering is based on keys in a descending order, and then all the
records with the same key are ordered ascendingly based on ts. The result
will be something like (k3,v1,t1), (k3,v2,t2), (k2,v2,t3), (k1.v1.t1)
(assuming that k1<k2<k3 and t1<t2<t3).

But in reality, we have limitations for having a global ordering based on
keys since we are iterating over the segments in a lazy manner. Therefore,
when we are processing the current segment, we have no knowledge of the
keys in the next segment.

Now I have two suggestions:
1. Changing the `MultiVersionedRangeQuery` class as follows:

private final ResultOrder *segmentOrder*;
private final contentOrder *segmentContentOrder*; // can be KEY_WISE or
TIMESTAMP_WISE
private final ResultOrder  *keyOrder*;
private final ResultOrder *timestampOrder*;

This way, the global ordering is specified by the `segmentOrder`. It means
we either show the results from the oldest to the latest segment
(ASCENDING) or from the latest to the oldest segment (DESCENDING).
Then, inside each segment, we guarantee a `segmentContentOrder` which can
be `KEY_WISE` or `TIMESTAMP_WISE`. The key order and timestamp order are
specified by the `keyOrder` and `timestampOrder` properties, respectively.
If the content of a segment must be ordered key-wise and then we have two
records with the same key (it happens in older segments), then the
`timestampOrder` determines the order between them.

2. We define that global ordering can only be based on timestamps (the
`timestampOrder` property), and if two records have the same timestamp, the
`keyOrder` determines the order between them.

I think the first suggestion gives more flexibility to the user, but it is
more complicated. I mean, it needs good Javadocs.

I look forward to your ideas.

Cheers,
Alieh


On Mon, Nov 6, 2023 at 3:08 PM Alieh Saeedi <as...@confluent.io> wrote:

> Thank you, Bruno and Matthias.
> I updated the KIP as follows:
> 1. The one remaining `asOf` word in the KIP is removed.
> 2. Example 2 is updated. Thanks, Bruno for the correction.
>
> Discussions and open questions
> 1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as well.
> Because the results must have a global ordering. Either based on key or
> based on ts. For example, we can have
> `orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
> global ordering is based on keys in a descending order, and then all the
> records with the same key are ordered ascendingly based on ts. The result
> will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
> (k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
> 2. About having the `latest()` method: it seems like we are undecided yet.
> Adding a new class or ignoring `latest()` for VersionedRangeQuery and
> instead using the `TimestampedRangeQuery` as Matthias suggested.
>
> Cheers,
> Alieh
>
> On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org> wrote:
>
>> Great discussion. Seems we are making good progress.
>>
>> I see advantages and disadvantages in splitting out a "single-ts
>> key-range" query type. I guess, the main question might be, what
>> use-cases do we anticipate and how common we believe they are?
>>
>> We should also take KIP-992 (adding `TimestampedRangeQuery`) into account.
>>
>> (1) The most common use case seems to be, a key-range over latest. For
>> this one, we could use `TimestampedRangeQuery` -- it would return a
>> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't
>> be any information loss, because "toTimestamp" would be `null` anyway.
>>
>>
>> (2) The open question is, how common is a key-range in a point in the
>> past? For this case, using
>> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
>> actually not to be a bad UX, and also does not really need to be
>> explained how to do this (compared to "latest" that required to pass in
>> MAX_INT).
>>
>>
>> If we add a new query type, we avoid both issues (are they actually
>> issues?) and add some nice syntactic sugar to the API. The question is,
>> if it's worth the effort and expanded API surface area?
>>
>> To summarize:
>>
>> Add new query type:
>>
>> > // queries latest; returns VersionedRecords
>> > VersionedRangeQuery.withKeyRange(...);
>> >
>> > VersionedRangeQuery.withKeyRange(...).asOf(ts);
>>
>> vs
>>
>> No new query type:
>>
>> > // queries latest; returns ValueAndTimestamps
>> > TimestampedRangeQuery.withRange(...);
>> >
>> > MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
>>
>>
>>
>> I guess, bottom line, I would be ok with either one and I am actually
>> not even sure which one I prefer personally. Just wanted to lay out the
>> tradeoffs I see. Not sure if three are other considerations that would
>> tip the scale into either direction?
>>
>>
>>
>> -Matthias
>>
>> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
>> > Hi Alieh,
>> >
>> > I like the examples!
>> >
>> >
>> > 1.
>> > Some terms like `asOf` in the descriptions still need to be replaced in
>> > the KIP.
>> >
>> >
>> > 2.
>> > In your last e-mail you state:
>> >
>> > "How can a user retrieve the latest value? We have the same issue with
>> > kip-968 as well."
>> >
>> > Why do we have the same issue in KIP-968?
>> > If users need to retrieve the latest value for a specific key, they
>> > should use KIP-960.
>> >
>> >
>> > 3.
>> > Regarding querying the latest version (or an asOf version) of records
>> in
>> > a given key range, that is exactly why I proposed to split the query
>> > class. One class would return the latest and the asOf versions (i.e. a
>> > single version) of records in a key range and the other class would
>> > return all versions in a given time range (i.e. multiple versions) of
>> > the records in a given key range. The splitting in two classes avoids
>> to
>> > specify a time range and latest or a time range and asOf on a given key
>> > range.
>> >
>> > Alternatively, you could keep one class and you could specify that the
>> > last call wins as you specified for fromTime() and toTime(). For
>> > example, for
>> >
>> >
>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
>> >
>> > latest() wins. However, how would you interpret
>> >
>> >
>> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
>> >
>> > Is it [t1, t2] or [-INF, t2]?
>> > (I would say the latter, but somebody else would say differently)
>> >
>> > The two class solution seems cleaner to me since we do not need to
>> > consider those edge cases.
>> > You could propose both classes in this KIP.
>> >
>> >
>> > 4.
>> > Why do we need orderByKey() and orderByTimestamp()?
>> > Aren't withAscendingKeys(), withDescendingKeys(),
>> > withAscendingTimestamps(), and withDescendingTimestamps() sufficient?
>> >
>> >
>> > 5.
>> > In example 2, why is
>> >
>> > key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
>> > 2023-01-25T10:00:00.00Z
>> >
>> > not part of the result?
>> > It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
>> > which overlaps with the time range [2023-01-17T10:00:00.00Z,
>> > 2023-01-30T10:00:00.00Z] of the query.
>> >
>> > (BTW, in the second example, you forgot to add "key" to the output.)
>> >
>> >
>> > Best,
>> > Bruno
>> >
>> >
>> > On 10/25/23 4:01 PM, Alieh Saeedi wrote:
>> >> Thanks, Matthias and Bruno.
>> >> Here is a list of updates:
>> >>
>> >>     1. I changed the variable and method names as I did for KIP-968, as
>> >>     follows:
>> >>        - "fromTimestamp" -> fromTime
>> >>        - "asOfTimestamp"-> toTime
>> >>        - "from(instant)" -> fromTime(instant)"
>> >>        - asOf(instant)"->toTime(instant)
>> >>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
>> >>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods for
>> >> user
>> >>     readability.
>> >>     3. I updated the "Example" section as well.
>> >>
>> >> Some points:
>> >>
>> >>     1. Even though the kip suggests adding the `get(k lowerkeybound, k
>> >>     upperkeybound, long fromtime, long totime)` method to the
>> >> interface, I
>> >>     added this method to the `rocksdbversionedstore` class for now.
>> >>     2. Matthias, you mentioned a very important point. How can a user
>> >>     retrieve the latest value? We have the same issue with kip-968 as
>> >> well.
>> >>     Asking a user to call `toTime(max)` violates the API design rules,
>> >> as you
>> >>     mentioned. So I think we must have a `latest()` method for both
>> >> KIP-968 and
>> >>     KIP-969. What do you think about that?
>> >>
>> >>
>> >> Cheers,
>> >> Alieh
>> >>
>> >> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
>> wrote:
>> >>
>> >>> Thanks for the update.
>> >>>
>> >>>
>> >>>> To retrieve
>> >>>>>      the latest value(s), the user must call just the asOf method
>> with
>> >>> the MAX
>> >>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think
>> >>>>> it is
>> >>> clumsy,
>> >>>>>      Matthias?
>> >>>
>> >>>
>> >>> Well, in KIP-968 calling `asOf` and passing in a timestamp is
>> optional,
>> >>> and default is "latest", right? So while `asOf(MAX)` does the same
>> >>> thing, practically users would never call `asOf` for a "latest" query?
>> >>>
>> >>> In this KIP, we enforce that users give us a key range (we have the 4
>> >>> static entry point methods to define a query for this), and we say we
>> >>> default to "no bounds" for time range by default.
>> >>>
>> >>> The existing `RangeQuery` allows to query a range of keys for existing
>> >>> stores. It seems to be a common pattern to query a key-range on
>> latest.
>> >>> -- in the current proposal, users would need to do:
>> >>>
>> >>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>> >>>
>> >>> Would like to hear from others if we think that's good user
>> experience?
>> >>> If we agree to accept this, I think we should explain how to do this
>> in
>> >>> the JavaDocs (and also regular docs... --- otherwise, I can already
>> >>> anticipate user question on all question-asking-channels how to do a
>> >>> "normal key range query". IMHO, the problem is not that the code
>> itself
>> >>> it too clumsy, but that it's totally not obvious to uses how to
>> express
>> >>> it without actually explaining it to them. It basically violated the
>> API
>> >>> design rule "make it easy to use / simple things should be easy".
>> >>>
>> >>> Btw: We could also re-use `RangeQuery` and add am implementation to
>> >>> `VersionedStateStore` to just accept this query type, with "key range
>> >>> over latest" semantics. -- The issue is of course, that uses need to
>> >>> know that the query would return `ValueAndTimestamp` and not plain `V`
>> >>> (or we add a translation step to unwrap the value, but we would lose
>> the
>> >>> "validFrom" timestamp -- validTo would be `null`). Because type safety
>> >>> is a general issue in IQv2 it would not make it worse (in the strict
>> >>> sense), but I am also not sure if we want to dig an even deeper
>> hole...
>> >>>
>> >>>
>> >>> -Matthias
>> >>>
>> >>>
>> >>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
>> >>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
>> >>> summary
>> >>>> of the updates I made to the KIP:
>> >>>>
>> >>>>      1.  I liked the idea of renaming methods as Matthias suggested.
>> >>>>      2. I removed the allversions() method as I did in KIP-968. To
>> >>> retrieve
>> >>>>      the latest value(s), the user must call just the asOf method
>> with
>> >>> the MAX
>> >>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think it
>> is
>> >>> clumsy,
>> >>>>      Matthias?
>> >>>>      3. I added a method to the *VersionedKeyValueStore *interface,
>> >>>> as I
>> >>> did
>> >>>>      for KIP-968.
>> >>>>      4. Matthias: I do not get what you mean by your second comment.
>> >>>> Isn't
>> >>>>      the KIP already explicit about that?
>> >>>>
>> >>>>      > I assume, results are returned by timestamp for each key. The
>> >>>> KIP
>> >>>>      should be explicit about it.
>> >>>>
>> >>>>
>> >>>> Cheers,
>> >>>> Alieh
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Thanks for updating the KIP.
>> >>>>>
>> >>>>> Not sure if I agree or not with Bruno's idea to split the query
>> types
>> >>>>> further? In the end, we split them only because there is three
>> >>>>> different
>> >>>>> return types: single value, value-iterator, key-value-iterator.
>> >>>>>
>> >>>>> What do we gain by splitting out single-ts-range-key? In the end,
>> for
>> >>>>> range-ts-range-key the proposed class is necessary and is a superset
>> >>>>> (one can set both timestamps to the same value, for single-ts
>> lookup).
>> >>>>>
>> >>>>> The mentioned simplification might apply to "single-ts-range-key"
>> >>>>> but I
>> >>>>> don't see a simplification for the proposed (and necessary) query
>> >>>>> type?
>> >>>>>
>> >>>>> On the other hand, I see an advantage of a single-ts-range-key for
>> >>>>> querying over the "latest version" with a range of keys. For a
>> >>>>> single-ts-range-key query, this it would be the default (similar to
>> >>>>> VersionedKeyQuery with not asOf-timestamped defined).
>> >>>>>
>> >>>>> In the current version of the KIP, (if we agree that default should
>> >>>>> actually return "all versions" not "latest" -- this default was
>> >>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
>> >>>>> need to
>> >>>>> have the same default here to stay consistent), users would need to
>> >>>>> pass
>> >>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
>> >>>>> latest point in time only, what seems to be clumsy? Or we could add
>> a
>> >>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does
>> >>>>> seems
>> >>>>> a little clumsy, too.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>> The overall order of the returned records is by Key
>> >>>>>
>> >>>>> I assume, results are returned by timestamp for each key. The KIP
>> >>>>> should
>> >>>>> be explicit about it.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> To be very explicit, should we rename the methods to specify the key
>> >>> bound?
>> >>>>>
>> >>>>>     - withRange -> withKeyRange
>> >>>>>     - withLowerBound -> withLowerKeyBound
>> >>>>>     - withUpperBound -> withUpperKeyBound
>> >>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
>> >>>>> `allVersions` and not `noTimeBound` and should align the naming?)
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> -Matthias
>> >>>>>
>> >>>>>
>> >>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
>> >>>>>> Hi Alieh,
>> >>>>>>
>> >>>>>> Thanks for the KIP!
>> >>>>>>
>> >>>>>> One high level comment/question:
>> >>>>>>
>> >>>>>> I assume you separated single key queries into two classes because
>> >>>>>> versioned key queries return a single value and multi version key
>> >>>>>> queries return iterators. Although, range queries always return
>> >>>>>> iterators, it would make sense to also separate range queries for
>> >>>>>> versioned state stores into range queries that return one single
>> >>> version
>> >>>>>> of the keys within a range and range queries that return multiple
>> >>>>>> version of the keys within a range, IMO. That would reduce the
>> >>>>>> meaningless combinations.
>> >>>>>> WDYT?
>> >>>>>>
>> >>>>>> Best,
>> >>>>>> Bruno
>> >>>>>>
>> >>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>> >>>>>>> Hi all,
>> >>>>>>>
>> >>>>>>> I splitted KIP-960
>> >>>>>>> <
>> >>>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
>> >>>>>>
>> >>>>>>> into
>> >>>>>>> three separate KIPs. Therefore, please continue discussions about
>> >>> range
>> >>>>>>> interactive queries here. You can see all the addressed reviews
>> >>>>>>> on the
>> >>>>>>> following page. Thanks in advance.
>> >>>>>>>
>> >>>>>>> KIP-969: Support range interactive queries (IQv2) for versioned
>> >>>>>>> state
>> >>>>>>> stores
>> >>>>>>> <
>> >>>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
>> >>>>>>
>> >>>>>>>
>> >>>>>>> I look forward to your feedback!
>> >>>>>>>
>> >>>>>>> Cheers,
>> >>>>>>> Alieh
>> >>>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>>
>

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Alieh Saeedi <as...@confluent.io.INVALID>.
Thank you, Bruno and Matthias.
I updated the KIP as follows:
1. The one remaining `asOf` word in the KIP is removed.
2. Example 2 is updated. Thanks, Bruno for the correction.

Discussions and open questions
1. Yes, Bruno. We need `orderByKey()` and `orderByTimestamp()` as well.
Because the results must have a global ordering. Either based on key or
based on ts. For example, we can have
`orderByKey().withDescendingKey().withAscendingTimestamps()`. Then the
global ordering is based on keys in a descending order, and then all the
records with the same key are ordered ascendingly based on ts. The result
will be something like (k3,v1,t1), (k3,v2,t2), (k3,v1,t1), (k2,v2,t2),
(k1.v1.t1) (assuming that k1<k2<k3 and t1<t2<t3)
2. About having the `latest()` method: it seems like we are undecided yet.
Adding a new class or ignoring `latest()` for VersionedRangeQuery and
instead using the `TimestampedRangeQuery` as Matthias suggested.

Cheers,
Alieh

On Sat, Nov 4, 2023 at 1:38 AM Matthias J. Sax <mj...@apache.org> wrote:

> Great discussion. Seems we are making good progress.
>
> I see advantages and disadvantages in splitting out a "single-ts
> key-range" query type. I guess, the main question might be, what
> use-cases do we anticipate and how common we believe they are?
>
> We should also take KIP-992 (adding `TimestampedRangeQuery`) into account.
>
> (1) The most common use case seems to be, a key-range over latest. For
> this one, we could use `TimestampedRangeQuery` -- it would return a
> `ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't
> be any information loss, because "toTimestamp" would be `null` anyway.
>
>
> (2) The open question is, how common is a key-range in a point in the
> past? For this case, using
> `MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems
> actually not to be a bad UX, and also does not really need to be
> explained how to do this (compared to "latest" that required to pass in
> MAX_INT).
>
>
> If we add a new query type, we avoid both issues (are they actually
> issues?) and add some nice syntactic sugar to the API. The question is,
> if it's worth the effort and expanded API surface area?
>
> To summarize:
>
> Add new query type:
>
> > // queries latest; returns VersionedRecords
> > VersionedRangeQuery.withKeyRange(...);
> >
> > VersionedRangeQuery.withKeyRange(...).asOf(ts);
>
> vs
>
> No new query type:
>
> > // queries latest; returns ValueAndTimestamps
> > TimestampedRangeQuery.withRange(...);
> >
> > MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)
>
>
>
> I guess, bottom line, I would be ok with either one and I am actually
> not even sure which one I prefer personally. Just wanted to lay out the
> tradeoffs I see. Not sure if three are other considerations that would
> tip the scale into either direction?
>
>
>
> -Matthias
>
> On 11/3/23 3:43 AM, Bruno Cadonna wrote:
> > Hi Alieh,
> >
> > I like the examples!
> >
> >
> > 1.
> > Some terms like `asOf` in the descriptions still need to be replaced in
> > the KIP.
> >
> >
> > 2.
> > In your last e-mail you state:
> >
> > "How can a user retrieve the latest value? We have the same issue with
> > kip-968 as well."
> >
> > Why do we have the same issue in KIP-968?
> > If users need to retrieve the latest value for a specific key, they
> > should use KIP-960.
> >
> >
> > 3.
> > Regarding querying the latest version (or an asOf version) of records in
> > a given key range, that is exactly why I proposed to split the query
> > class. One class would return the latest and the asOf versions (i.e. a
> > single version) of records in a key range and the other class would
> > return all versions in a given time range (i.e. multiple versions) of
> > the records in a given key range. The splitting in two classes avoids to
> > specify a time range and latest or a time range and asOf on a given key
> > range.
> >
> > Alternatively, you could keep one class and you could specify that the
> > last call wins as you specified for fromTime() and toTime(). For
> > example, for
> >
> >
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
> >
> > latest() wins. However, how would you interpret
> >
> >
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
> >
> > Is it [t1, t2] or [-INF, t2]?
> > (I would say the latter, but somebody else would say differently)
> >
> > The two class solution seems cleaner to me since we do not need to
> > consider those edge cases.
> > You could propose both classes in this KIP.
> >
> >
> > 4.
> > Why do we need orderByKey() and orderByTimestamp()?
> > Aren't withAscendingKeys(), withDescendingKeys(),
> > withAscendingTimestamps(), and withDescendingTimestamps() sufficient?
> >
> >
> > 5.
> > In example 2, why is
> >
> > key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till:
> > 2023-01-25T10:00:00.00Z
> >
> > not part of the result?
> > It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z
> > which overlaps with the time range [2023-01-17T10:00:00.00Z,
> > 2023-01-30T10:00:00.00Z] of the query.
> >
> > (BTW, in the second example, you forgot to add "key" to the output.)
> >
> >
> > Best,
> > Bruno
> >
> >
> > On 10/25/23 4:01 PM, Alieh Saeedi wrote:
> >> Thanks, Matthias and Bruno.
> >> Here is a list of updates:
> >>
> >>     1. I changed the variable and method names as I did for KIP-968, as
> >>     follows:
> >>        - "fromTimestamp" -> fromTime
> >>        - "asOfTimestamp"-> toTime
> >>        - "from(instant)" -> fromTime(instant)"
> >>        - asOf(instant)"->toTime(instant)
> >>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
> >>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods for
> >> user
> >>     readability.
> >>     3. I updated the "Example" section as well.
> >>
> >> Some points:
> >>
> >>     1. Even though the kip suggests adding the `get(k lowerkeybound, k
> >>     upperkeybound, long fromtime, long totime)` method to the
> >> interface, I
> >>     added this method to the `rocksdbversionedstore` class for now.
> >>     2. Matthias, you mentioned a very important point. How can a user
> >>     retrieve the latest value? We have the same issue with kip-968 as
> >> well.
> >>     Asking a user to call `toTime(max)` violates the API design rules,
> >> as you
> >>     mentioned. So I think we must have a `latest()` method for both
> >> KIP-968 and
> >>     KIP-969. What do you think about that?
> >>
> >>
> >> Cheers,
> >> Alieh
> >>
> >> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org>
> wrote:
> >>
> >>> Thanks for the update.
> >>>
> >>>
> >>>> To retrieve
> >>>>>      the latest value(s), the user must call just the asOf method
> with
> >>> the MAX
> >>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think
> >>>>> it is
> >>> clumsy,
> >>>>>      Matthias?
> >>>
> >>>
> >>> Well, in KIP-968 calling `asOf` and passing in a timestamp is optional,
> >>> and default is "latest", right? So while `asOf(MAX)` does the same
> >>> thing, practically users would never call `asOf` for a "latest" query?
> >>>
> >>> In this KIP, we enforce that users give us a key range (we have the 4
> >>> static entry point methods to define a query for this), and we say we
> >>> default to "no bounds" for time range by default.
> >>>
> >>> The existing `RangeQuery` allows to query a range of keys for existing
> >>> stores. It seems to be a common pattern to query a key-range on latest.
> >>> -- in the current proposal, users would need to do:
> >>>
> >>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
> >>>
> >>> Would like to hear from others if we think that's good user experience?
> >>> If we agree to accept this, I think we should explain how to do this in
> >>> the JavaDocs (and also regular docs... --- otherwise, I can already
> >>> anticipate user question on all question-asking-channels how to do a
> >>> "normal key range query". IMHO, the problem is not that the code itself
> >>> it too clumsy, but that it's totally not obvious to uses how to express
> >>> it without actually explaining it to them. It basically violated the
> API
> >>> design rule "make it easy to use / simple things should be easy".
> >>>
> >>> Btw: We could also re-use `RangeQuery` and add am implementation to
> >>> `VersionedStateStore` to just accept this query type, with "key range
> >>> over latest" semantics. -- The issue is of course, that uses need to
> >>> know that the query would return `ValueAndTimestamp` and not plain `V`
> >>> (or we add a translation step to unwrap the value, but we would lose
> the
> >>> "validFrom" timestamp -- validTo would be `null`). Because type safety
> >>> is a general issue in IQv2 it would not make it worse (in the strict
> >>> sense), but I am also not sure if we want to dig an even deeper hole...
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
> >>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
> >>> summary
> >>>> of the updates I made to the KIP:
> >>>>
> >>>>      1.  I liked the idea of renaming methods as Matthias suggested.
> >>>>      2. I removed the allversions() method as I did in KIP-968. To
> >>> retrieve
> >>>>      the latest value(s), the user must call just the asOf method with
> >>> the MAX
> >>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think it
> is
> >>> clumsy,
> >>>>      Matthias?
> >>>>      3. I added a method to the *VersionedKeyValueStore *interface,
> >>>> as I
> >>> did
> >>>>      for KIP-968.
> >>>>      4. Matthias: I do not get what you mean by your second comment.
> >>>> Isn't
> >>>>      the KIP already explicit about that?
> >>>>
> >>>>      > I assume, results are returned by timestamp for each key. The
> >>>> KIP
> >>>>      should be explicit about it.
> >>>>
> >>>>
> >>>> Cheers,
> >>>> Alieh
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Thanks for updating the KIP.
> >>>>>
> >>>>> Not sure if I agree or not with Bruno's idea to split the query types
> >>>>> further? In the end, we split them only because there is three
> >>>>> different
> >>>>> return types: single value, value-iterator, key-value-iterator.
> >>>>>
> >>>>> What do we gain by splitting out single-ts-range-key? In the end, for
> >>>>> range-ts-range-key the proposed class is necessary and is a superset
> >>>>> (one can set both timestamps to the same value, for single-ts
> lookup).
> >>>>>
> >>>>> The mentioned simplification might apply to "single-ts-range-key"
> >>>>> but I
> >>>>> don't see a simplification for the proposed (and necessary) query
> >>>>> type?
> >>>>>
> >>>>> On the other hand, I see an advantage of a single-ts-range-key for
> >>>>> querying over the "latest version" with a range of keys. For a
> >>>>> single-ts-range-key query, this it would be the default (similar to
> >>>>> VersionedKeyQuery with not asOf-timestamped defined).
> >>>>>
> >>>>> In the current version of the KIP, (if we agree that default should
> >>>>> actually return "all versions" not "latest" -- this default was
> >>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would
> >>>>> need to
> >>>>> have the same default here to stay consistent), users would need to
> >>>>> pass
> >>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
> >>>>> latest point in time only, what seems to be clumsy? Or we could add a
> >>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does
> >>>>> seems
> >>>>> a little clumsy, too.
> >>>>>
> >>>>>
> >>>>>
> >>>>>> The overall order of the returned records is by Key
> >>>>>
> >>>>> I assume, results are returned by timestamp for each key. The KIP
> >>>>> should
> >>>>> be explicit about it.
> >>>>>
> >>>>>
> >>>>>
> >>>>> To be very explicit, should we rename the methods to specify the key
> >>> bound?
> >>>>>
> >>>>>     - withRange -> withKeyRange
> >>>>>     - withLowerBound -> withLowerKeyBound
> >>>>>     - withUpperBound -> withUpperKeyBound
> >>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
> >>>>> `allVersions` and not `noTimeBound` and should align the naming?)
> >>>>>
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
> >>>>>> Hi Alieh,
> >>>>>>
> >>>>>> Thanks for the KIP!
> >>>>>>
> >>>>>> One high level comment/question:
> >>>>>>
> >>>>>> I assume you separated single key queries into two classes because
> >>>>>> versioned key queries return a single value and multi version key
> >>>>>> queries return iterators. Although, range queries always return
> >>>>>> iterators, it would make sense to also separate range queries for
> >>>>>> versioned state stores into range queries that return one single
> >>> version
> >>>>>> of the keys within a range and range queries that return multiple
> >>>>>> version of the keys within a range, IMO. That would reduce the
> >>>>>> meaningless combinations.
> >>>>>> WDYT?
> >>>>>>
> >>>>>> Best,
> >>>>>> Bruno
> >>>>>>
> >>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I splitted KIP-960
> >>>>>>> <
> >>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >>>>>>
> >>>>>>> into
> >>>>>>> three separate KIPs. Therefore, please continue discussions about
> >>> range
> >>>>>>> interactive queries here. You can see all the addressed reviews
> >>>>>>> on the
> >>>>>>> following page. Thanks in advance.
> >>>>>>>
> >>>>>>> KIP-969: Support range interactive queries (IQv2) for versioned
> >>>>>>> state
> >>>>>>> stores
> >>>>>>> <
> >>>>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >>>>>>
> >>>>>>>
> >>>>>>> I look forward to your feedback!
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Alieh
> >>>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by "Matthias J. Sax" <mj...@apache.org>.
Great discussion. Seems we are making good progress.

I see advantages and disadvantages in splitting out a "single-ts 
key-range" query type. I guess, the main question might be, what 
use-cases do we anticipate and how common we believe they are?

We should also take KIP-992 (adding `TimestampedRangeQuery`) into account.

(1) The most common use case seems to be, a key-range over latest. For 
this one, we could use `TimestampedRangeQuery` -- it would return a 
`ValueAndTimestamp<V>` instead of a `VersionedRecord<V>` but the won't 
be any information loss, because "toTimestamp" would be `null` anyway.


(2) The open question is, how common is a key-range in a point in the 
past? For this case, using 
`MultiVersionedRangeQuery.withKeyRange().from(myTs).to(myTs)` seems 
actually not to be a bad UX, and also does not really need to be 
explained how to do this (compared to "latest" that required to pass in 
MAX_INT).


If we add a new query type, we avoid both issues (are they actually 
issues?) and add some nice syntactic sugar to the API. The question is, 
if it's worth the effort and expanded API surface area?

To summarize:

Add new query type:

> // queries latest; returns VersionedRecords
> VersionedRangeQuery.withKeyRange(...);
> 
> VersionedRangeQuery.withKeyRange(...).asOf(ts);

vs

No new query type:

> // queries latest; returns ValueAndTimestamps
> TimestampedRangeQuery.withRange(...); 
> 
> MultiVersionedRangeQuery.withKeyRange(...).from(myTs).to(myTs)



I guess, bottom line, I would be ok with either one and I am actually 
not even sure which one I prefer personally. Just wanted to lay out the 
tradeoffs I see. Not sure if three are other considerations that would 
tip the scale into either direction?



-Matthias

On 11/3/23 3:43 AM, Bruno Cadonna wrote:
> Hi Alieh,
> 
> I like the examples!
> 
> 
> 1.
> Some terms like `asOf` in the descriptions still need to be replaced in 
> the KIP.
> 
> 
> 2.
> In your last e-mail you state:
> 
> "How can a user retrieve the latest value? We have the same issue with 
> kip-968 as well."
> 
> Why do we have the same issue in KIP-968?
> If users need to retrieve the latest value for a specific key, they 
> should use KIP-960.
> 
> 
> 3.
> Regarding querying the latest version (or an asOf version) of records in 
> a given key range, that is exactly why I proposed to split the query 
> class. One class would return the latest and the asOf versions (i.e. a 
> single version) of records in a key range and the other class would 
> return all versions in a given time range (i.e. multiple versions) of 
> the records in a given key range. The splitting in two classes avoids to 
> specify a time range and latest or a time range and asOf on a given key 
> range.
> 
> Alternatively, you could keep one class and you could specify that the 
> last call wins as you specified for fromTime() and toTime(). For 
> example, for
> 
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()
> 
> latest() wins. However, how would you interpret
> 
> MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)
> 
> Is it [t1, t2] or [-INF, t2]?
> (I would say the latter, but somebody else would say differently)
> 
> The two class solution seems cleaner to me since we do not need to 
> consider those edge cases.
> You could propose both classes in this KIP.
> 
> 
> 4.
> Why do we need orderByKey() and orderByTimestamp()?
> Aren't withAscendingKeys(), withDescendingKeys(), 
> withAscendingTimestamps(), and withDescendingTimestamps() sufficient?
> 
> 
> 5.
> In example 2, why is
> 
> key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till: 
> 2023-01-25T10:00:00.00Z
> 
> not part of the result?
> It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z 
> which overlaps with the time range [2023-01-17T10:00:00.00Z, 
> 2023-01-30T10:00:00.00Z] of the query.
> 
> (BTW, in the second example, you forgot to add "key" to the output.)
> 
> 
> Best,
> Bruno
> 
> 
> On 10/25/23 4:01 PM, Alieh Saeedi wrote:
>> Thanks, Matthias and Bruno.
>> Here is a list of updates:
>>
>>     1. I changed the variable and method names as I did for KIP-968, as
>>     follows:
>>        - "fromTimestamp" -> fromTime
>>        - "asOfTimestamp"-> toTime
>>        - "from(instant)" -> fromTime(instant)"
>>        - asOf(instant)"->toTime(instant)
>>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
>>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods for 
>> user
>>     readability.
>>     3. I updated the "Example" section as well.
>>
>> Some points:
>>
>>     1. Even though the kip suggests adding the `get(k lowerkeybound, k
>>     upperkeybound, long fromtime, long totime)` method to the 
>> interface, I
>>     added this method to the `rocksdbversionedstore` class for now.
>>     2. Matthias, you mentioned a very important point. How can a user
>>     retrieve the latest value? We have the same issue with kip-968 as 
>> well.
>>     Asking a user to call `toTime(max)` violates the API design rules, 
>> as you
>>     mentioned. So I think we must have a `latest()` method for both 
>> KIP-968 and
>>     KIP-969. What do you think about that?
>>
>>
>> Cheers,
>> Alieh
>>
>> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org> wrote:
>>
>>> Thanks for the update.
>>>
>>>
>>>> To retrieve
>>>>>      the latest value(s), the user must call just the asOf method with
>>> the MAX
>>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think 
>>>>> it is
>>> clumsy,
>>>>>      Matthias?
>>>
>>>
>>> Well, in KIP-968 calling `asOf` and passing in a timestamp is optional,
>>> and default is "latest", right? So while `asOf(MAX)` does the same
>>> thing, practically users would never call `asOf` for a "latest" query?
>>>
>>> In this KIP, we enforce that users give us a key range (we have the 4
>>> static entry point methods to define a query for this), and we say we
>>> default to "no bounds" for time range by default.
>>>
>>> The existing `RangeQuery` allows to query a range of keys for existing
>>> stores. It seems to be a common pattern to query a key-range on latest.
>>> -- in the current proposal, users would need to do:
>>>
>>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>>>
>>> Would like to hear from others if we think that's good user experience?
>>> If we agree to accept this, I think we should explain how to do this in
>>> the JavaDocs (and also regular docs... --- otherwise, I can already
>>> anticipate user question on all question-asking-channels how to do a
>>> "normal key range query". IMHO, the problem is not that the code itself
>>> it too clumsy, but that it's totally not obvious to uses how to express
>>> it without actually explaining it to them. It basically violated the API
>>> design rule "make it easy to use / simple things should be easy".
>>>
>>> Btw: We could also re-use `RangeQuery` and add am implementation to
>>> `VersionedStateStore` to just accept this query type, with "key range
>>> over latest" semantics. -- The issue is of course, that uses need to
>>> know that the query would return `ValueAndTimestamp` and not plain `V`
>>> (or we add a translation step to unwrap the value, but we would lose the
>>> "validFrom" timestamp -- validTo would be `null`). Because type safety
>>> is a general issue in IQv2 it would not make it worse (in the strict
>>> sense), but I am also not sure if we want to dig an even deeper hole...
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
>>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
>>> summary
>>>> of the updates I made to the KIP:
>>>>
>>>>      1.  I liked the idea of renaming methods as Matthias suggested.
>>>>      2. I removed the allversions() method as I did in KIP-968. To
>>> retrieve
>>>>      the latest value(s), the user must call just the asOf method with
>>> the MAX
>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think it is
>>> clumsy,
>>>>      Matthias?
>>>>      3. I added a method to the *VersionedKeyValueStore *interface, 
>>>> as I
>>> did
>>>>      for KIP-968.
>>>>      4. Matthias: I do not get what you mean by your second comment. 
>>>> Isn't
>>>>      the KIP already explicit about that?
>>>>
>>>>      > I assume, results are returned by timestamp for each key. The 
>>>> KIP
>>>>      should be explicit about it.
>>>>
>>>>
>>>> Cheers,
>>>> Alieh
>>>>
>>>>
>>>>
>>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org> 
>>>> wrote:
>>>>
>>>>> Thanks for updating the KIP.
>>>>>
>>>>> Not sure if I agree or not with Bruno's idea to split the query types
>>>>> further? In the end, we split them only because there is three 
>>>>> different
>>>>> return types: single value, value-iterator, key-value-iterator.
>>>>>
>>>>> What do we gain by splitting out single-ts-range-key? In the end, for
>>>>> range-ts-range-key the proposed class is necessary and is a superset
>>>>> (one can set both timestamps to the same value, for single-ts lookup).
>>>>>
>>>>> The mentioned simplification might apply to "single-ts-range-key" 
>>>>> but I
>>>>> don't see a simplification for the proposed (and necessary) query 
>>>>> type?
>>>>>
>>>>> On the other hand, I see an advantage of a single-ts-range-key for
>>>>> querying over the "latest version" with a range of keys. For a
>>>>> single-ts-range-key query, this it would be the default (similar to
>>>>> VersionedKeyQuery with not asOf-timestamped defined).
>>>>>
>>>>> In the current version of the KIP, (if we agree that default should
>>>>> actually return "all versions" not "latest" -- this default was
>>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would 
>>>>> need to
>>>>> have the same default here to stay consistent), users would need to 
>>>>> pass
>>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
>>>>> latest point in time only, what seems to be clumsy? Or we could add a
>>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does 
>>>>> seems
>>>>> a little clumsy, too.
>>>>>
>>>>>
>>>>>
>>>>>> The overall order of the returned records is by Key
>>>>>
>>>>> I assume, results are returned by timestamp for each key. The KIP 
>>>>> should
>>>>> be explicit about it.
>>>>>
>>>>>
>>>>>
>>>>> To be very explicit, should we rename the methods to specify the key
>>> bound?
>>>>>
>>>>>     - withRange -> withKeyRange
>>>>>     - withLowerBound -> withLowerKeyBound
>>>>>     - withUpperBound -> withUpperKeyBound
>>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
>>>>> `allVersions` and not `noTimeBound` and should align the naming?)
>>>>>
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
>>>>>> Hi Alieh,
>>>>>>
>>>>>> Thanks for the KIP!
>>>>>>
>>>>>> One high level comment/question:
>>>>>>
>>>>>> I assume you separated single key queries into two classes because
>>>>>> versioned key queries return a single value and multi version key
>>>>>> queries return iterators. Although, range queries always return
>>>>>> iterators, it would make sense to also separate range queries for
>>>>>> versioned state stores into range queries that return one single
>>> version
>>>>>> of the keys within a range and range queries that return multiple
>>>>>> version of the keys within a range, IMO. That would reduce the
>>>>>> meaningless combinations.
>>>>>> WDYT?
>>>>>>
>>>>>> Best,
>>>>>> Bruno
>>>>>>
>>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I splitted KIP-960
>>>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>>
>>>>>>> into
>>>>>>> three separate KIPs. Therefore, please continue discussions about
>>> range
>>>>>>> interactive queries here. You can see all the addressed reviews 
>>>>>>> on the
>>>>>>> following page. Thanks in advance.
>>>>>>>
>>>>>>> KIP-969: Support range interactive queries (IQv2) for versioned 
>>>>>>> state
>>>>>>> stores
>>>>>>> <
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>>
>>>>>>>
>>>>>>> I look forward to your feedback!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Alieh
>>>>>>>
>>>>>
>>>>
>>>
>>

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Bruno Cadonna <ca...@apache.org>.
Hi Alieh,

I like the examples!


1.
Some terms like `asOf` in the descriptions still need to be replaced in 
the KIP.


2.
In your last e-mail you state:

"How can a user retrieve the latest value? We have the same issue with 
kip-968 as well."

Why do we have the same issue in KIP-968?
If users need to retrieve the latest value for a specific key, they 
should use KIP-960.


3.
Regarding querying the latest version (or an asOf version) of records in 
a given key range, that is exactly why I proposed to split the query 
class. One class would return the latest and the asOf versions (i.e. a 
single version) of records in a key range and the other class would 
return all versions in a given time range (i.e. multiple versions) of 
the records in a given key range. The splitting in two classes avoids to 
specify a time range and latest or a time range and asOf on a given key 
range.

Alternatively, you could keep one class and you could specify that the 
last call wins as you specified for fromTime() and toTime(). For 
example, for

MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).toTime(t2).latest()

latest() wins. However, how would you interpret

MultiVersionedRangeQuery.withLowerKeyBound(k1).fromTime(t1).latest().toTime(t2)

Is it [t1, t2] or [-INF, t2]?
(I would say the latter, but somebody else would say differently)

The two class solution seems cleaner to me since we do not need to 
consider those edge cases.
You could propose both classes in this KIP.


4.
Why do we need orderByKey() and orderByTimestamp()?
Aren't withAscendingKeys(), withDescendingKeys(), 
withAscendingTimestamps(), and withDescendingTimestamps() sufficient?


5.
In example 2, why is

key,value: 2,20, timestamp: 2023-01-10T10:00:00.00Z, valid till: 
2023-01-25T10:00:00.00Z

not part of the result?
It is valid from 2023-01-10T10:00:00.00Z to 2023-01-25T10:00:00.00Z 
which overlaps with the time range [2023-01-17T10:00:00.00Z, 
2023-01-30T10:00:00.00Z] of the query.

(BTW, in the second example, you forgot to add "key" to the output.)


Best,
Bruno


On 10/25/23 4:01 PM, Alieh Saeedi wrote:
> Thanks, Matthias and Bruno.
> Here is a list of updates:
> 
>     1. I changed the variable and method names as I did for KIP-968, as
>     follows:
>        - "fromTimestamp" -> fromTime
>        - "asOfTimestamp"-> toTime
>        - "from(instant)" -> fromTime(instant)"
>        - asOf(instant)"->toTime(instant)
>     2. As Bruno suggested for KIP-968, I added `orderByKey()`,
>     `withAscendingKeys()`, and `withAscendingTimestamps()` methods for user
>     readability.
>     3. I updated the "Example" section as well.
> 
> Some points:
> 
>     1. Even though the kip suggests adding the `get(k lowerkeybound, k
>     upperkeybound, long fromtime, long totime)` method to the interface, I
>     added this method to the `rocksdbversionedstore` class for now.
>     2. Matthias, you mentioned a very important point. How can a user
>     retrieve the latest value? We have the same issue with kip-968 as well.
>     Asking a user to call `toTime(max)` violates the API design rules, as you
>     mentioned. So I think we must have a `latest()` method for both KIP-968 and
>     KIP-969. What do you think about that?
> 
> 
> Cheers,
> Alieh
> 
> On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Thanks for the update.
>>
>>
>>> To retrieve
>>>>      the latest value(s), the user must call just the asOf method with
>> the MAX
>>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think it is
>> clumsy,
>>>>      Matthias?
>>
>>
>> Well, in KIP-968 calling `asOf` and passing in a timestamp is optional,
>> and default is "latest", right? So while `asOf(MAX)` does the same
>> thing, practically users would never call `asOf` for a "latest" query?
>>
>> In this KIP, we enforce that users give us a key range (we have the 4
>> static entry point methods to define a query for this), and we say we
>> default to "no bounds" for time range by default.
>>
>> The existing `RangeQuery` allows to query a range of keys for existing
>> stores. It seems to be a common pattern to query a key-range on latest.
>> -- in the current proposal, users would need to do:
>>
>> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>>
>> Would like to hear from others if we think that's good user experience?
>> If we agree to accept this, I think we should explain how to do this in
>> the JavaDocs (and also regular docs... --- otherwise, I can already
>> anticipate user question on all question-asking-channels how to do a
>> "normal key range query". IMHO, the problem is not that the code itself
>> it too clumsy, but that it's totally not obvious to uses how to express
>> it without actually explaining it to them. It basically violated the API
>> design rule "make it easy to use / simple things should be easy".
>>
>> Btw: We could also re-use `RangeQuery` and add am implementation to
>> `VersionedStateStore` to just accept this query type, with "key range
>> over latest" semantics. -- The issue is of course, that uses need to
>> know that the query would return `ValueAndTimestamp` and not plain `V`
>> (or we add a translation step to unwrap the value, but we would lose the
>> "validFrom" timestamp -- validTo would be `null`). Because type safety
>> is a general issue in IQv2 it would not make it worse (in the strict
>> sense), but I am also not sure if we want to dig an even deeper hole...
>>
>>
>> -Matthias
>>
>>
>> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
>>> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
>> summary
>>> of the updates I made to the KIP:
>>>
>>>      1.  I liked the idea of renaming methods as Matthias suggested.
>>>      2. I removed the allversions() method as I did in KIP-968. To
>> retrieve
>>>      the latest value(s), the user must call just the asOf method with
>> the MAX
>>>      value (asOf(MAX)). The same applies to KIP-968. Do you think it is
>> clumsy,
>>>      Matthias?
>>>      3. I added a method to the *VersionedKeyValueStore *interface, as I
>> did
>>>      for KIP-968.
>>>      4. Matthias: I do not get what you mean by your second comment. Isn't
>>>      the KIP already explicit about that?
>>>
>>>      > I assume, results are returned by timestamp for each key. The KIP
>>>      should be explicit about it.
>>>
>>>
>>> Cheers,
>>> Alieh
>>>
>>>
>>>
>>> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org> wrote:
>>>
>>>> Thanks for updating the KIP.
>>>>
>>>> Not sure if I agree or not with Bruno's idea to split the query types
>>>> further? In the end, we split them only because there is three different
>>>> return types: single value, value-iterator, key-value-iterator.
>>>>
>>>> What do we gain by splitting out single-ts-range-key? In the end, for
>>>> range-ts-range-key the proposed class is necessary and is a superset
>>>> (one can set both timestamps to the same value, for single-ts lookup).
>>>>
>>>> The mentioned simplification might apply to "single-ts-range-key" but I
>>>> don't see a simplification for the proposed (and necessary) query type?
>>>>
>>>> On the other hand, I see an advantage of a single-ts-range-key for
>>>> querying over the "latest version" with a range of keys. For a
>>>> single-ts-range-key query, this it would be the default (similar to
>>>> VersionedKeyQuery with not asOf-timestamped defined).
>>>>
>>>> In the current version of the KIP, (if we agree that default should
>>>> actually return "all versions" not "latest" -- this default was
>>>> suggested by Bruno on KIP-968 and makes sense to me, so we would need to
>>>> have the same default here to stay consistent), users would need to pass
>>>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
>>>> latest point in time only, what seems to be clumsy? Or we could add a
>>>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does seems
>>>> a little clumsy, too.
>>>>
>>>>
>>>>
>>>>> The overall order of the returned records is by Key
>>>>
>>>> I assume, results are returned by timestamp for each key. The KIP should
>>>> be explicit about it.
>>>>
>>>>
>>>>
>>>> To be very explicit, should we rename the methods to specify the key
>> bound?
>>>>
>>>>     - withRange -> withKeyRange
>>>>     - withLowerBound -> withLowerKeyBound
>>>>     - withUpperBound -> withUpperKeyBound
>>>>     - withNoBounds -> allKeys (or withNoKeyBounds, but we use
>>>> `allVersions` and not `noTimeBound` and should align the naming?)
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
>>>>> Hi Alieh,
>>>>>
>>>>> Thanks for the KIP!
>>>>>
>>>>> One high level comment/question:
>>>>>
>>>>> I assume you separated single key queries into two classes because
>>>>> versioned key queries return a single value and multi version key
>>>>> queries return iterators. Although, range queries always return
>>>>> iterators, it would make sense to also separate range queries for
>>>>> versioned state stores into range queries that return one single
>> version
>>>>> of the keys within a range and range queries that return multiple
>>>>> version of the keys within a range, IMO. That would reduce the
>>>>> meaningless combinations.
>>>>> WDYT?
>>>>>
>>>>> Best,
>>>>> Bruno
>>>>>
>>>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I splitted KIP-960
>>>>>> <
>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>
>>>>>> into
>>>>>> three separate KIPs. Therefore, please continue discussions about
>> range
>>>>>> interactive queries here. You can see all the addressed reviews on the
>>>>>> following page. Thanks in advance.
>>>>>>
>>>>>> KIP-969: Support range interactive queries (IQv2) for versioned state
>>>>>> stores
>>>>>> <
>>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>>>
>>>>>>
>>>>>> I look forward to your feedback!
>>>>>>
>>>>>> Cheers,
>>>>>> Alieh
>>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Alieh Saeedi <as...@confluent.io.INVALID>.
Thanks, Matthias and Bruno.
Here is a list of updates:

   1. I changed the variable and method names as I did for KIP-968, as
   follows:
      - "fromTimestamp" -> fromTime
      - "asOfTimestamp"-> toTime
      - "from(instant)" -> fromTime(instant)"
      - asOf(instant)"->toTime(instant)
   2. As Bruno suggested for KIP-968, I added `orderByKey()`,
   `withAscendingKeys()`, and `withAscendingTimestamps()` methods for user
   readability.
   3. I updated the "Example" section as well.

Some points:

   1. Even though the kip suggests adding the `get(k lowerkeybound, k
   upperkeybound, long fromtime, long totime)` method to the interface, I
   added this method to the `rocksdbversionedstore` class for now.
   2. Matthias, you mentioned a very important point. How can a user
   retrieve the latest value? We have the same issue with kip-968 as well.
   Asking a user to call `toTime(max)` violates the API design rules, as you
   mentioned. So I think we must have a `latest()` method for both KIP-968 and
   KIP-969. What do you think about that?


Cheers,
Alieh

On Thu, Oct 12, 2023 at 6:33 AM Matthias J. Sax <mj...@apache.org> wrote:

> Thanks for the update.
>
>
> > To retrieve
> >>     the latest value(s), the user must call just the asOf method with
> the MAX
> >>     value (asOf(MAX)). The same applies to KIP-968. Do you think it is
> clumsy,
> >>     Matthias?
>
>
> Well, in KIP-968 calling `asOf` and passing in a timestamp is optional,
> and default is "latest", right? So while `asOf(MAX)` does the same
> thing, practically users would never call `asOf` for a "latest" query?
>
> In this KIP, we enforce that users give us a key range (we have the 4
> static entry point methods to define a query for this), and we say we
> default to "no bounds" for time range by default.
>
> The existing `RangeQuery` allows to query a range of keys for existing
> stores. It seems to be a common pattern to query a key-range on latest.
> -- in the current proposal, users would need to do:
>
> MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);
>
> Would like to hear from others if we think that's good user experience?
> If we agree to accept this, I think we should explain how to do this in
> the JavaDocs (and also regular docs... --- otherwise, I can already
> anticipate user question on all question-asking-channels how to do a
> "normal key range query". IMHO, the problem is not that the code itself
> it too clumsy, but that it's totally not obvious to uses how to express
> it without actually explaining it to them. It basically violated the API
> design rule "make it easy to use / simple things should be easy".
>
> Btw: We could also re-use `RangeQuery` and add am implementation to
> `VersionedStateStore` to just accept this query type, with "key range
> over latest" semantics. -- The issue is of course, that uses need to
> know that the query would return `ValueAndTimestamp` and not plain `V`
> (or we add a translation step to unwrap the value, but we would lose the
> "validFrom" timestamp -- validTo would be `null`). Because type safety
> is a general issue in IQv2 it would not make it worse (in the strict
> sense), but I am also not sure if we want to dig an even deeper hole...
>
>
> -Matthias
>
>
> On 10/10/23 11:55 AM, Alieh Saeedi wrote:
> > Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a
> summary
> > of the updates I made to the KIP:
> >
> >     1.  I liked the idea of renaming methods as Matthias suggested.
> >     2. I removed the allversions() method as I did in KIP-968. To
> retrieve
> >     the latest value(s), the user must call just the asOf method with
> the MAX
> >     value (asOf(MAX)). The same applies to KIP-968. Do you think it is
> clumsy,
> >     Matthias?
> >     3. I added a method to the *VersionedKeyValueStore *interface, as I
> did
> >     for KIP-968.
> >     4. Matthias: I do not get what you mean by your second comment. Isn't
> >     the KIP already explicit about that?
> >
> >     > I assume, results are returned by timestamp for each key. The KIP
> >     should be explicit about it.
> >
> >
> > Cheers,
> > Alieh
> >
> >
> >
> > On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org> wrote:
> >
> >> Thanks for updating the KIP.
> >>
> >> Not sure if I agree or not with Bruno's idea to split the query types
> >> further? In the end, we split them only because there is three different
> >> return types: single value, value-iterator, key-value-iterator.
> >>
> >> What do we gain by splitting out single-ts-range-key? In the end, for
> >> range-ts-range-key the proposed class is necessary and is a superset
> >> (one can set both timestamps to the same value, for single-ts lookup).
> >>
> >> The mentioned simplification might apply to "single-ts-range-key" but I
> >> don't see a simplification for the proposed (and necessary) query type?
> >>
> >> On the other hand, I see an advantage of a single-ts-range-key for
> >> querying over the "latest version" with a range of keys. For a
> >> single-ts-range-key query, this it would be the default (similar to
> >> VersionedKeyQuery with not asOf-timestamped defined).
> >>
> >> In the current version of the KIP, (if we agree that default should
> >> actually return "all versions" not "latest" -- this default was
> >> suggested by Bruno on KIP-968 and makes sense to me, so we would need to
> >> have the same default here to stay consistent), users would need to pass
> >> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
> >> latest point in time only, what seems to be clumsy? Or we could add a
> >> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does seems
> >> a little clumsy, too.
> >>
> >>
> >>
> >>> The overall order of the returned records is by Key
> >>
> >> I assume, results are returned by timestamp for each key. The KIP should
> >> be explicit about it.
> >>
> >>
> >>
> >> To be very explicit, should we rename the methods to specify the key
> bound?
> >>
> >>    - withRange -> withKeyRange
> >>    - withLowerBound -> withLowerKeyBound
> >>    - withUpperBound -> withUpperKeyBound
> >>    - withNoBounds -> allKeys (or withNoKeyBounds, but we use
> >> `allVersions` and not `noTimeBound` and should align the naming?)
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
> >>> Hi Alieh,
> >>>
> >>> Thanks for the KIP!
> >>>
> >>> One high level comment/question:
> >>>
> >>> I assume you separated single key queries into two classes because
> >>> versioned key queries return a single value and multi version key
> >>> queries return iterators. Although, range queries always return
> >>> iterators, it would make sense to also separate range queries for
> >>> versioned state stores into range queries that return one single
> version
> >>> of the keys within a range and range queries that return multiple
> >>> version of the keys within a range, IMO. That would reduce the
> >>> meaningless combinations.
> >>> WDYT?
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
> >>>> Hi all,
> >>>>
> >>>> I splitted KIP-960
> >>>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >>>
> >>>> into
> >>>> three separate KIPs. Therefore, please continue discussions about
> range
> >>>> interactive queries here. You can see all the addressed reviews on the
> >>>> following page. Thanks in advance.
> >>>>
> >>>> KIP-969: Support range interactive queries (IQv2) for versioned state
> >>>> stores
> >>>> <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >>>
> >>>>
> >>>> I look forward to your feedback!
> >>>>
> >>>> Cheers,
> >>>> Alieh
> >>>>
> >>
> >
>

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by "Matthias J. Sax" <mj...@apache.org>.
Thanks for the update.


> To retrieve
>>     the latest value(s), the user must call just the asOf method with the MAX
>>     value (asOf(MAX)). The same applies to KIP-968. Do you think it is clumsy,
>>     Matthias?


Well, in KIP-968 calling `asOf` and passing in a timestamp is optional, 
and default is "latest", right? So while `asOf(MAX)` does the same 
thing, practically users would never call `asOf` for a "latest" query?

In this KIP, we enforce that users give us a key range (we have the 4 
static entry point methods to define a query for this), and we say we 
default to "no bounds" for time range by default.

The existing `RangeQuery` allows to query a range of keys for existing 
stores. It seems to be a common pattern to query a key-range on latest. 
-- in the current proposal, users would need to do:

MultiVersionedRangeQuery.withKeyRange(startKey, endKey).asOf(MAX);

Would like to hear from others if we think that's good user experience? 
If we agree to accept this, I think we should explain how to do this in 
the JavaDocs (and also regular docs... --- otherwise, I can already 
anticipate user question on all question-asking-channels how to do a 
"normal key range query". IMHO, the problem is not that the code itself 
it too clumsy, but that it's totally not obvious to uses how to express 
it without actually explaining it to them. It basically violated the API 
design rule "make it easy to use / simple things should be easy".

Btw: We could also re-use `RangeQuery` and add am implementation to 
`VersionedStateStore` to just accept this query type, with "key range 
over latest" semantics. -- The issue is of course, that uses need to 
know that the query would return `ValueAndTimestamp` and not plain `V` 
(or we add a translation step to unwrap the value, but we would lose the 
"validFrom" timestamp -- validTo would be `null`). Because type safety 
is a general issue in IQv2 it would not make it worse (in the strict 
sense), but I am also not sure if we want to dig an even deeper hole...


-Matthias


On 10/10/23 11:55 AM, Alieh Saeedi wrote:
> Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a summary
> of the updates I made to the KIP:
> 
>     1.  I liked the idea of renaming methods as Matthias suggested.
>     2. I removed the allversions() method as I did in KIP-968. To retrieve
>     the latest value(s), the user must call just the asOf method with the MAX
>     value (asOf(MAX)). The same applies to KIP-968. Do you think it is clumsy,
>     Matthias?
>     3. I added a method to the *VersionedKeyValueStore *interface, as I did
>     for KIP-968.
>     4. Matthias: I do not get what you mean by your second comment. Isn't
>     the KIP already explicit about that?
> 
>     > I assume, results are returned by timestamp for each key. The KIP
>     should be explicit about it.
> 
> 
> Cheers,
> Alieh
> 
> 
> 
> On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org> wrote:
> 
>> Thanks for updating the KIP.
>>
>> Not sure if I agree or not with Bruno's idea to split the query types
>> further? In the end, we split them only because there is three different
>> return types: single value, value-iterator, key-value-iterator.
>>
>> What do we gain by splitting out single-ts-range-key? In the end, for
>> range-ts-range-key the proposed class is necessary and is a superset
>> (one can set both timestamps to the same value, for single-ts lookup).
>>
>> The mentioned simplification might apply to "single-ts-range-key" but I
>> don't see a simplification for the proposed (and necessary) query type?
>>
>> On the other hand, I see an advantage of a single-ts-range-key for
>> querying over the "latest version" with a range of keys. For a
>> single-ts-range-key query, this it would be the default (similar to
>> VersionedKeyQuery with not asOf-timestamped defined).
>>
>> In the current version of the KIP, (if we agree that default should
>> actually return "all versions" not "latest" -- this default was
>> suggested by Bruno on KIP-968 and makes sense to me, so we would need to
>> have the same default here to stay consistent), users would need to pass
>> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
>> latest point in time only, what seems to be clumsy? Or we could add a
>> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does seems
>> a little clumsy, too.
>>
>>
>>
>>> The overall order of the returned records is by Key
>>
>> I assume, results are returned by timestamp for each key. The KIP should
>> be explicit about it.
>>
>>
>>
>> To be very explicit, should we rename the methods to specify the key bound?
>>
>>    - withRange -> withKeyRange
>>    - withLowerBound -> withLowerKeyBound
>>    - withUpperBound -> withUpperKeyBound
>>    - withNoBounds -> allKeys (or withNoKeyBounds, but we use
>> `allVersions` and not `noTimeBound` and should align the naming?)
>>
>>
>>
>> -Matthias
>>
>>
>> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
>>> Hi Alieh,
>>>
>>> Thanks for the KIP!
>>>
>>> One high level comment/question:
>>>
>>> I assume you separated single key queries into two classes because
>>> versioned key queries return a single value and multi version key
>>> queries return iterators. Although, range queries always return
>>> iterators, it would make sense to also separate range queries for
>>> versioned state stores into range queries that return one single version
>>> of the keys within a range and range queries that return multiple
>>> version of the keys within a range, IMO. That would reduce the
>>> meaningless combinations.
>>> WDYT?
>>>
>>> Best,
>>> Bruno
>>>
>>> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>>>> Hi all,
>>>>
>>>> I splitted KIP-960
>>>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>
>>>> into
>>>> three separate KIPs. Therefore, please continue discussions about range
>>>> interactive queries here. You can see all the addressed reviews on the
>>>> following page. Thanks in advance.
>>>>
>>>> KIP-969: Support range interactive queries (IQv2) for versioned state
>>>> stores
>>>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
>>>
>>>>
>>>> I look forward to your feedback!
>>>>
>>>> Cheers,
>>>> Alieh
>>>>
>>
> 

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by Alieh Saeedi <as...@confluent.io.INVALID>.
Thanks, Matthias and Bruno, for the feedback on KIP-969. Here is a summary
of the updates I made to the KIP:

   1.  I liked the idea of renaming methods as Matthias suggested.
   2. I removed the allversions() method as I did in KIP-968. To retrieve
   the latest value(s), the user must call just the asOf method with the MAX
   value (asOf(MAX)). The same applies to KIP-968. Do you think it is clumsy,
   Matthias?
   3. I added a method to the *VersionedKeyValueStore *interface, as I did
   for KIP-968.
   4. Matthias: I do not get what you mean by your second comment. Isn't
   the KIP already explicit about that?

   > I assume, results are returned by timestamp for each key. The KIP
   should be explicit about it.


Cheers,
Alieh



On Tue, Oct 3, 2023 at 6:07 AM Matthias J. Sax <mj...@apache.org> wrote:

> Thanks for updating the KIP.
>
> Not sure if I agree or not with Bruno's idea to split the query types
> further? In the end, we split them only because there is three different
> return types: single value, value-iterator, key-value-iterator.
>
> What do we gain by splitting out single-ts-range-key? In the end, for
> range-ts-range-key the proposed class is necessary and is a superset
> (one can set both timestamps to the same value, for single-ts lookup).
>
> The mentioned simplification might apply to "single-ts-range-key" but I
> don't see a simplification for the proposed (and necessary) query type?
>
> On the other hand, I see an advantage of a single-ts-range-key for
> querying over the "latest version" with a range of keys. For a
> single-ts-range-key query, this it would be the default (similar to
> VersionedKeyQuery with not asOf-timestamped defined).
>
> In the current version of the KIP, (if we agree that default should
> actually return "all versions" not "latest" -- this default was
> suggested by Bruno on KIP-968 and makes sense to me, so we would need to
> have the same default here to stay consistent), users would need to pass
> in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the
> latest point in time only, what seems to be clumsy? Or we could add a
> `lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does seems
> a little clumsy, too.
>
>
>
> > The overall order of the returned records is by Key
>
> I assume, results are returned by timestamp for each key. The KIP should
> be explicit about it.
>
>
>
> To be very explicit, should we rename the methods to specify the key bound?
>
>   - withRange -> withKeyRange
>   - withLowerBound -> withLowerKeyBound
>   - withUpperBound -> withUpperKeyBound
>   - withNoBounds -> allKeys (or withNoKeyBounds, but we use
> `allVersions` and not `noTimeBound` and should align the naming?)
>
>
>
> -Matthias
>
>
> On 9/6/23 5:25 AM, Bruno Cadonna wrote:
> > Hi Alieh,
> >
> > Thanks for the KIP!
> >
> > One high level comment/question:
> >
> > I assume you separated single key queries into two classes because
> > versioned key queries return a single value and multi version key
> > queries return iterators. Although, range queries always return
> > iterators, it would make sense to also separate range queries for
> > versioned state stores into range queries that return one single version
> > of the keys within a range and range queries that return multiple
> > version of the keys within a range, IMO. That would reduce the
> > meaningless combinations.
> > WDYT?
> >
> > Best,
> > Bruno
> >
> > On 8/16/23 8:01 PM, Alieh Saeedi wrote:
> >> Hi all,
> >>
> >> I splitted KIP-960
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >
> >> into
> >> three separate KIPs. Therefore, please continue discussions about range
> >> interactive queries here. You can see all the addressed reviews on the
> >> following page. Thanks in advance.
> >>
> >> KIP-969: Support range interactive queries (IQv2) for versioned state
> >> stores
> >> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores
> >
> >>
> >> I look forward to your feedback!
> >>
> >> Cheers,
> >> Alieh
> >>
>

Re: [DISCUSS] KIP-969: Support range interactive queries for versioned state stores

Posted by "Matthias J. Sax" <mj...@apache.org>.
Thanks for updating the KIP.

Not sure if I agree or not with Bruno's idea to split the query types 
further? In the end, we split them only because there is three different 
return types: single value, value-iterator, key-value-iterator.

What do we gain by splitting out single-ts-range-key? In the end, for 
range-ts-range-key the proposed class is necessary and is a superset 
(one can set both timestamps to the same value, for single-ts lookup).

The mentioned simplification might apply to "single-ts-range-key" but I 
don't see a simplification for the proposed (and necessary) query type?

On the other hand, I see an advantage of a single-ts-range-key for 
querying over the "latest version" with a range of keys. For a 
single-ts-range-key query, this it would be the default (similar to 
VersionedKeyQuery with not asOf-timestamped defined).

In the current version of the KIP, (if we agree that default should 
actually return "all versions" not "latest" -- this default was 
suggested by Bruno on KIP-968 and makes sense to me, so we would need to 
have the same default here to stay consistent), users would need to pass 
in `from(Long.MAX).to(Long.MAX)` (if I got this right) to query the 
latest point in time only, what seems to be clumsy? Or we could add a 
`lastestKeyOnly` option to `MultiVersionedRangeQuery`, but it does seems 
a little clumsy, too.



> The overall order of the returned records is by Key

I assume, results are returned by timestamp for each key. The KIP should 
be explicit about it.



To be very explicit, should we rename the methods to specify the key bound?

  - withRange -> withKeyRange
  - withLowerBound -> withLowerKeyBound
  - withUpperBound -> withUpperKeyBound
  - withNoBounds -> allKeys (or withNoKeyBounds, but we use 
`allVersions` and not `noTimeBound` and should align the naming?)



-Matthias


On 9/6/23 5:25 AM, Bruno Cadonna wrote:
> Hi Alieh,
> 
> Thanks for the KIP!
> 
> One high level comment/question:
> 
> I assume you separated single key queries into two classes because 
> versioned key queries return a single value and multi version key 
> queries return iterators. Although, range queries always return 
> iterators, it would make sense to also separate range queries for 
> versioned state stores into range queries that return one single version 
> of the keys within a range and range queries that return multiple 
> version of the keys within a range, IMO. That would reduce the 
> meaningless combinations.
> WDYT?
> 
> Best,
> Bruno
> 
> On 8/16/23 8:01 PM, Alieh Saeedi wrote:
>> Hi all,
>>
>> I splitted KIP-960
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-960%3A+Support+single-key_single-timestamp+interactive+queries+%28IQv2%29+for+versioned+state+stores>
>> into
>> three separate KIPs. Therefore, please continue discussions about range
>> interactive queries here. You can see all the addressed reviews on the
>> following page. Thanks in advance.
>>
>> KIP-969: Support range interactive queries (IQv2) for versioned state 
>> stores
>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-969%3A+Support+range+interactive+queries+%28IQv2%29+for+versioned+state+stores>
>>
>> I look forward to your feedback!
>>
>> Cheers,
>> Alieh
>>