You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Matthias J. Sax" <ma...@confluent.io> on 2017/10/05 21:58:59 UTC

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Thanks for driving this and sorry for late response. With release
deadline it was pretty busy lately.

Can you please add a description for the suggested method, what they are
going to return? It's a little unclear to me atm.

It would also be helpful to discuss, for which use case each method is
useful. This might also help to identify potential gaps for which
another API might be more helpful.

Also, we should talk about provided guarantees when using those APIs
with regard to consistency -- not saying that we need to provide strong
guarantees, but he KIP should describe what user can expect.

-Matthias

On 9/24/17 8:11 PM, Richard Yu wrote:
> Hello, I would like to solicit review and comment on this issue (link
> below):
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Guozhang Wang <wa...@gmail.com>.

Hello Richard, Xavier:

I think I am convinced my your arguments. How about keeping all() as is and
rename "range" to "fetchAll" then?


Guozhang



On Thu, Oct 26, 2017 at 9:21 AM, Xavier Léauté <xa...@confluent.io> wrote:

> I don't feel this worth holding up the vote for, if no one else shares my
> concerns.
> On Wed, Oct 25, 2017 at 15:59 Richard Yu <yo...@gmail.com>
> wrote:
>
> > Xavier: There has been two pluses on the voting thread. Are you fine with
> > the current formation?
> >
> > On Tue, Oct 24, 2017 at 4:26 PM, Richard Yu <yo...@gmail.com>
> > wrote:
> >
> > > I think we can come up with this compromise: range(long timeFrom, long
> > > timeTo) will be changed to getKeys(long timeFrom, long timeTo). Sounds
> > fair?
> > >
> > >
> > > On Tue, Oct 24, 2017 at 10:44 AM, Xavier Léauté <xa...@confluent.io>
> > > wrote:
> > >
> > >> >
> > >> > Generally I think having `all / range` is better in terms of
> > consistency
> > >> > with key-value windows. I.e. queries with key are named as `get /
> > fetch`
> > >> > for kv / window stores, and queries without key are named as `range
> /
> > >> all`.
> > >> >
> > >>
> > >> For kv stores, range takes a range of keys, and with this proposal
> range
> > >> on
> > >> window stores would take a range of time, that does not sound
> consistent
> > >> to
> > >> me at all.
> > >>
> > >> We also already have fetch which take both a range of time and keys.
> > >>
> > >
> > >
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Xavier Léauté <xa...@confluent.io>.

I don't feel this worth holding up the vote for, if no one else shares my
concerns.
On Wed, Oct 25, 2017 at 15:59 Richard Yu <yo...@gmail.com> wrote:

> Xavier: There has been two pluses on the voting thread. Are you fine with
> the current formation?
>
> On Tue, Oct 24, 2017 at 4:26 PM, Richard Yu <yo...@gmail.com>
> wrote:
>
> > I think we can come up with this compromise: range(long timeFrom, long
> > timeTo) will be changed to getKeys(long timeFrom, long timeTo). Sounds
> fair?
> >
> >
> > On Tue, Oct 24, 2017 at 10:44 AM, Xavier Léauté <xa...@confluent.io>
> > wrote:
> >
> >> >
> >> > Generally I think having `all / range` is better in terms of
> consistency
> >> > with key-value windows. I.e. queries with key are named as `get /
> fetch`
> >> > for kv / window stores, and queries without key are named as `range /
> >> all`.
> >> >
> >>
> >> For kv stores, range takes a range of keys, and with this proposal range
> >> on
> >> window stores would take a range of time, that does not sound consistent
> >> to
> >> me at all.
> >>
> >> We also already have fetch which take both a range of time and keys.
> >>
> >
> >
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Richard Yu <yo...@gmail.com>.

Xavier: There has been two pluses on the voting thread. Are you fine with
the current formation?

On Tue, Oct 24, 2017 at 4:26 PM, Richard Yu <yo...@gmail.com>
wrote:

> I think we can come up with this compromise: range(long timeFrom, long
> timeTo) will be changed to getKeys(long timeFrom, long timeTo). Sounds fair?
>
>
> On Tue, Oct 24, 2017 at 10:44 AM, Xavier Léauté <xa...@confluent.io>
> wrote:
>
>> >
>> > Generally I think having `all / range` is better in terms of consistency
>> > with key-value windows. I.e. queries with key are named as `get / fetch`
>> > for kv / window stores, and queries without key are named as `range /
>> all`.
>> >
>>
>> For kv stores, range takes a range of keys, and with this proposal range
>> on
>> window stores would take a range of time, that does not sound consistent
>> to
>> me at all.
>>
>> We also already have fetch which take both a range of time and keys.
>>
>
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Richard Yu <yo...@gmail.com>.

I think we can come up with this compromise: range(long timeFrom, long
timeTo) will be changed to getKeys(long timeFrom, long timeTo). Sounds fair?

On Tue, Oct 24, 2017 at 10:44 AM, Xavier Léauté <xa...@confluent.io> wrote:

> >
> > Generally I think having `all / range` is better in terms of consistency
> > with key-value windows. I.e. queries with key are named as `get / fetch`
> > for kv / window stores, and queries without key are named as `range /
> all`.
> >
>
> For kv stores, range takes a range of keys, and with this proposal range on
> window stores would take a range of time, that does not sound consistent to
> me at all.
>
> We also already have fetch which take both a range of time and keys.
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Xavier Léauté <xa...@confluent.io>.

>
> Generally I think having `all / range` is better in terms of consistency
> with key-value windows. I.e. queries with key are named as `get / fetch`
> for kv / window stores, and queries without key are named as `range / all`.
>

For kv stores, range takes a range of keys, and with this proposal range on
window stores would take a range of time, that does not sound consistent to
me at all.

We also already have fetch which take both a range of time and keys.

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Guozhang Wang <wa...@gmail.com>.

Thanks Richard for updating the wiki.

Made another pass over it, overall it LGTM. Just a few minor comments:

1) Could you update the title to reflect the function names accordingly?
Generally I think having `all / range` is better in terms of consistency
with key-value windows. I.e. queries with key are named as `get / fetch`
for kv / window stores, and queries without key are named as `range / all`.

2) In the Javadoc, "@throws NullPointerException if null is used for any key"
is not needed as the API does not have any key parameters.


Please feel free to start the voting thread, also at the same time start
implementing the PR in case you may find some impl corner cases that have
not been thought about in the wiki. That would help us discover any
potential blockers earlier.


Guozhang


On Wed, Oct 18, 2017 at 8:42 PM, Richard Yu <yo...@gmail.com>
wrote:

> Soliciting more feedback before vote.
>
> On Wed, Oct 18, 2017 at 8:26 PM, Richard Yu <yo...@gmail.com>
> wrote:
>
> > Is this KIP close to completion? Because we could start working on the
> > code itself now. (Its at about this stage).
> >
> > On Mon, Oct 16, 2017 at 7:37 PM, Richard Yu <yo...@gmail.com>
> > wrote:
> >
> >> As Guozhang Wang mentioned earlier, we want to mirror the structure of
> >> similar Store class (namely KTable). The WindowedStore class might be
> >> unique in itself as it uses fetch() methods, but in my opinion,
> uniformity
> >> should be better suited for simplicity.
> >>
> >> On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xa...@confluent.io>
> >> wrote:
> >>
> >>> Thank you Richard! Do you or Guozhang have any thoughts on my
> suggestions
> >>> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the
> "range"
> >>> keyword for when we query a specific range of keys?
> >>>
> >>> Xavier
> >>>
> >>> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yohan.richard.yu@gmail.com
> >
> >>> wrote:
> >>>
> >>> > Thanks for the clarifications, Xavier.
> >>> > I have removed most of the methods except for keys() and all() which
> >>> has
> >>> > been renamed to Guozhang Wang's suggestions.
> >>> >
> >>> > Hope this helps.
> >>> >
> >>> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xa...@confluent.io>
> >>> > wrote:
> >>> >
> >>> > > Thanks for the KIP Richard, this is a very useful addition!
> >>> > >
> >>> > > As far as the API changes, I just have a few comments on the
> methods
> >>> that
> >>> > > don't seem directly related to the KIP title, and naming of course
> >>> :).
> >>> > > On the implementation, see my notes further down that will
> hopefully
> >>> > > clarify a few things.
> >>> > >
> >>> > > Regarding the "bonus" methods:
> >>> > > I agree with Guozhang that the KIP lacks proper motivation for
> >>> adding the
> >>> > > min, max, and allLatest methods.
> >>> > > It is also not clear to me what min and max would really mean, what
> >>> > > ordering do we refer to here? Are we first ordering by time, then
> >>> key, or
> >>> > > first by key, then time?
> >>> > > The allLatest method might be useful, but I don't really see how it
> >>> would
> >>> > > be used in practice if we have to scan the entire range of keys for
> >>> all
> >>> > the
> >>> > > state stores, every single time.
> >>> > >
> >>> > > Maybe we could flesh the motivation behind those extra methods, but
> >>> in
> >>> > the
> >>> > > interest of time, and moving the KIP forward it might make sense to
> >>> file
> >>> > a
> >>> > > follow-up once we have more concrete use-cases.
> >>> > >
> >>> > > On naming:
> >>> > > I also agree with Guozhang that "keys()" should be renamed. It
> feels
> >>> a
> >>> > bit
> >>> > > of a misnomer, since it not only returns keys, but also the values.
> >>> > >
> >>> > > As far as what to rename it to, I would argue we already have some
> >>> > > discrepancy between key-value stores using range() vs. window
> stores
> >>> > using
> >>> > > fetch().
> >>> > > I assume we called the window method "fetch" instead of "get"
> >>> because you
> >>> > > might get back more than one window for the requested key.
> >>> > >
> >>> > > If we wanted to make things consistent with both existing key-value
> >>> store
> >>> > > naming and window store naming, we could do the following:
> >>> > > Decide that "all" always refers to the entire range of keys,
> >>> independent
> >>> > of
> >>> > > the window and similarly "range" always refers to a particular
> range
> >>> of
> >>> > > keys, irrespective of the window.
> >>> > > We can then prefix methods with "fetch" to indicate that more than
> >>> one
> >>> > > window may be returned for each key in the range.
> >>> > >
> >>> > > This would give us:
> >>> > > - a new fetchAll() method for all the keys, which makes it clear
> >>> that you
> >>> > > might get back the same key in different windows
> >>> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a
> >>> given
> >>> > > time range, again with possibly more than one window per key
> >>> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K,
> >>> long,
> >>> > > long)  and deprecate the old one to indicate a range of keys
> >>> > >
> >>> > > One inconsistency I noted: the "Proposed Changes" section in your
> KIP
> >>> > talks
> >>> > > about a "range(timeFrom, timeTo)" method, I think you meant to
> refer
> >>> to
> >>> > the
> >>> > > all(from, to) method, but I'm sure you'll fix that once we decide
> on
> >>> > > naming.
> >>> > >
> >>> > > On the implementation side:
> >>> > > You mentioned that caching and rocksdb store have very different
> >>> > key/value
> >>> > > structures, and while it appears to be that way on the surface, the
> >>> > > structure between the two is actually very similar. Keys in the
> >>> cache are
> >>> > > prefixed with a segment ID to ensure the ordering in the cache
> stays
> >>> > > consistent with the rocksdb implementation, which maintains
> multiple
> >>> > > rocksdb instances, one for each segment. So we just "artificially"
> >>> mirror
> >>> > > the segment structure in the cache.
> >>> > >
> >>> > > The reason for keeping the ordering consistent is pretty simple:
> >>> keep in
> >>> > > mind that when we query a cached window store we are effectively
> >>> querying
> >>> > > both the cache and the persistent rocksdb store at the same time,
> >>> merging
> >>> > > results from both. To make that merge as painless as possible, we
> >>> ensure
> >>> > > the ordering is consistent when querying a range of keys in both
> >>> stores.
> >>> > >
> >>> > > Also keep in mind CompositeReadonlyWindowStore, which wraps
> multiple
> >>> > window
> >>> > > stores within a topology.
> >>> > >
> >>> > > Hope this clarifies some of the less trivial parts of caching
> window
> >>> > store.
> >>> > >
> >>> > > Cheers,
> >>> > > Xavier
> >>> > >
> >>> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wa...@gmail.com>
> >>> wrote:
> >>> > >
> >>> > > > Richard, Matthias:
> >>> > > >
> >>> > > > 0. Could you describe a bit what are the possible use cases of
> >>> > > `allLatest`,
> >>> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a
> >>> minimum
> >>> > > > necessary amount, to avoid a swamp of new APIs that no one would
> >>> really
> >>> > > use
> >>> > > > but just complicated the internal code base.
> >>> > > >
> >>> > > > 1. One minor comment on the other two new APIs: could we rename
> >>> `keys`
> >>> > to
> >>> > > > `all` and `all` to `range` to be consistent with the other
> store's
> >>> > APIs?
> >>> > > >
> >>> > > > 2. One meta comment on the implementation details: since both
> >>> `keys`
> >>> > and
> >>> > > > `all` would likely touch multiple segments, we may need to use
> the
> >>> > > internal
> >>> > > > `SegmentIterator` class, but currently it always requires a Bytes
> >>> from
> >>> > > and
> >>> > > > Bytes to for its constructor. What changes we need to make for
> that
> >>> > > class?
> >>> > > >
> >>> > > >
> >>> > > > Guozhang
> >>> > > >
> >>> > > >
> >>> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <
> >>> yohan.richard.yu@gmail.com
> >>> > >
> >>> > > > wrote:
> >>> > > >
> >>> > > > > We should split KAFKA-4499 into several sub-issues with 4499
> >>> being
> >>> > the
> >>> > > > > parent issue.
> >>> > > > > Adding the implementation to CachingWindowStore,
> >>> RocksDBWindowStore,
> >>> > > etc
> >>> > > > > will each require the addition of a test and implementing the
> >>> methods
> >>> > > > which
> >>> > > > > is not trivial.
> >>> > > > > This way, it should be easier to manage the progress of the
> KIP.
> >>> > > > >
> >>> > > > >
> >>> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
> >>> > matthias@confluent.io
> >>> > > >
> >>> > > > > wrote:
> >>> > > > >
> >>> > > > > > Thanks for driving this and sorry for late response. With
> >>> release
> >>> > > > > > deadline it was pretty busy lately.
> >>> > > > > >
> >>> > > > > > Can you please add a description for the suggested method,
> what
> >>> > they
> >>> > > > are
> >>> > > > > > going to return? It's a little unclear to me atm.
> >>> > > > > >
> >>> > > > > > It would also be helpful to discuss, for which use case each
> >>> method
> >>> > > is
> >>> > > > > > useful. This might also help to identify potential gaps for
> >>> which
> >>> > > > > > another API might be more helpful.
> >>> > > > > >
> >>> > > > > > Also, we should talk about provided guarantees when using
> those
> >>> > APIs
> >>> > > > > > with regard to consistency -- not saying that we need to
> >>> provide
> >>> > > strong
> >>> > > > > > guarantees, but he KIP should describe what user can expect.
> >>> > > > > >
> >>> > > > > >
> >>> > > > > > -Matthias
> >>> > > > > >
> >>> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
> >>> > > > > > > Hello, I would like to solicit review and comment on this
> >>> issue
> >>> > > (link
> >>> > > > > > > below):
> >>> > > > > > >
> >>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> >>> > > > > > >
> >>> > > > > >
> >>> > > > > >
> >>> > > > >
> >>> > > >
> >>> > > >
> >>> > > >
> >>> > > > --
> >>> > > > -- Guozhang
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Richard Yu <yo...@gmail.com>.

Soliciting more feedback before vote.

On Wed, Oct 18, 2017 at 8:26 PM, Richard Yu <yo...@gmail.com>
wrote:

> Is this KIP close to completion? Because we could start working on the
> code itself now. (Its at about this stage).
>
> On Mon, Oct 16, 2017 at 7:37 PM, Richard Yu <yo...@gmail.com>
> wrote:
>
>> As Guozhang Wang mentioned earlier, we want to mirror the structure of
>> similar Store class (namely KTable). The WindowedStore class might be
>> unique in itself as it uses fetch() methods, but in my opinion, uniformity
>> should be better suited for simplicity.
>>
>> On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xa...@confluent.io>
>> wrote:
>>
>>> Thank you Richard! Do you or Guozhang have any thoughts on my suggestions
>>> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range"
>>> keyword for when we query a specific range of keys?
>>>
>>> Xavier
>>>
>>> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yo...@gmail.com>
>>> wrote:
>>>
>>> > Thanks for the clarifications, Xavier.
>>> > I have removed most of the methods except for keys() and all() which
>>> has
>>> > been renamed to Guozhang Wang's suggestions.
>>> >
>>> > Hope this helps.
>>> >
>>> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xa...@confluent.io>
>>> > wrote:
>>> >
>>> > > Thanks for the KIP Richard, this is a very useful addition!
>>> > >
>>> > > As far as the API changes, I just have a few comments on the methods
>>> that
>>> > > don't seem directly related to the KIP title, and naming of course
>>> :).
>>> > > On the implementation, see my notes further down that will hopefully
>>> > > clarify a few things.
>>> > >
>>> > > Regarding the "bonus" methods:
>>> > > I agree with Guozhang that the KIP lacks proper motivation for
>>> adding the
>>> > > min, max, and allLatest methods.
>>> > > It is also not clear to me what min and max would really mean, what
>>> > > ordering do we refer to here? Are we first ordering by time, then
>>> key, or
>>> > > first by key, then time?
>>> > > The allLatest method might be useful, but I don't really see how it
>>> would
>>> > > be used in practice if we have to scan the entire range of keys for
>>> all
>>> > the
>>> > > state stores, every single time.
>>> > >
>>> > > Maybe we could flesh the motivation behind those extra methods, but
>>> in
>>> > the
>>> > > interest of time, and moving the KIP forward it might make sense to
>>> file
>>> > a
>>> > > follow-up once we have more concrete use-cases.
>>> > >
>>> > > On naming:
>>> > > I also agree with Guozhang that "keys()" should be renamed. It feels
>>> a
>>> > bit
>>> > > of a misnomer, since it not only returns keys, but also the values.
>>> > >
>>> > > As far as what to rename it to, I would argue we already have some
>>> > > discrepancy between key-value stores using range() vs. window stores
>>> > using
>>> > > fetch().
>>> > > I assume we called the window method "fetch" instead of "get"
>>> because you
>>> > > might get back more than one window for the requested key.
>>> > >
>>> > > If we wanted to make things consistent with both existing key-value
>>> store
>>> > > naming and window store naming, we could do the following:
>>> > > Decide that "all" always refers to the entire range of keys,
>>> independent
>>> > of
>>> > > the window and similarly "range" always refers to a particular range
>>> of
>>> > > keys, irrespective of the window.
>>> > > We can then prefix methods with "fetch" to indicate that more than
>>> one
>>> > > window may be returned for each key in the range.
>>> > >
>>> > > This would give us:
>>> > > - a new fetchAll() method for all the keys, which makes it clear
>>> that you
>>> > > might get back the same key in different windows
>>> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a
>>> given
>>> > > time range, again with possibly more than one window per key
>>> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K,
>>> long,
>>> > > long)  and deprecate the old one to indicate a range of keys
>>> > >
>>> > > One inconsistency I noted: the "Proposed Changes" section in your KIP
>>> > talks
>>> > > about a "range(timeFrom, timeTo)" method, I think you meant to refer
>>> to
>>> > the
>>> > > all(from, to) method, but I'm sure you'll fix that once we decide on
>>> > > naming.
>>> > >
>>> > > On the implementation side:
>>> > > You mentioned that caching and rocksdb store have very different
>>> > key/value
>>> > > structures, and while it appears to be that way on the surface, the
>>> > > structure between the two is actually very similar. Keys in the
>>> cache are
>>> > > prefixed with a segment ID to ensure the ordering in the cache stays
>>> > > consistent with the rocksdb implementation, which maintains multiple
>>> > > rocksdb instances, one for each segment. So we just "artificially"
>>> mirror
>>> > > the segment structure in the cache.
>>> > >
>>> > > The reason for keeping the ordering consistent is pretty simple:
>>> keep in
>>> > > mind that when we query a cached window store we are effectively
>>> querying
>>> > > both the cache and the persistent rocksdb store at the same time,
>>> merging
>>> > > results from both. To make that merge as painless as possible, we
>>> ensure
>>> > > the ordering is consistent when querying a range of keys in both
>>> stores.
>>> > >
>>> > > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple
>>> > window
>>> > > stores within a topology.
>>> > >
>>> > > Hope this clarifies some of the less trivial parts of caching window
>>> > store.
>>> > >
>>> > > Cheers,
>>> > > Xavier
>>> > >
>>> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wa...@gmail.com>
>>> wrote:
>>> > >
>>> > > > Richard, Matthias:
>>> > > >
>>> > > > 0. Could you describe a bit what are the possible use cases of
>>> > > `allLatest`,
>>> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a
>>> minimum
>>> > > > necessary amount, to avoid a swamp of new APIs that no one would
>>> really
>>> > > use
>>> > > > but just complicated the internal code base.
>>> > > >
>>> > > > 1. One minor comment on the other two new APIs: could we rename
>>> `keys`
>>> > to
>>> > > > `all` and `all` to `range` to be consistent with the other store's
>>> > APIs?
>>> > > >
>>> > > > 2. One meta comment on the implementation details: since both
>>> `keys`
>>> > and
>>> > > > `all` would likely touch multiple segments, we may need to use the
>>> > > internal
>>> > > > `SegmentIterator` class, but currently it always requires a Bytes
>>> from
>>> > > and
>>> > > > Bytes to for its constructor. What changes we need to make for that
>>> > > class?
>>> > > >
>>> > > >
>>> > > > Guozhang
>>> > > >
>>> > > >
>>> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <
>>> yohan.richard.yu@gmail.com
>>> > >
>>> > > > wrote:
>>> > > >
>>> > > > > We should split KAFKA-4499 into several sub-issues with 4499
>>> being
>>> > the
>>> > > > > parent issue.
>>> > > > > Adding the implementation to CachingWindowStore,
>>> RocksDBWindowStore,
>>> > > etc
>>> > > > > will each require the addition of a test and implementing the
>>> methods
>>> > > > which
>>> > > > > is not trivial.
>>> > > > > This way, it should be easier to manage the progress of the KIP.
>>> > > > >
>>> > > > >
>>> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
>>> > matthias@confluent.io
>>> > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > Thanks for driving this and sorry for late response. With
>>> release
>>> > > > > > deadline it was pretty busy lately.
>>> > > > > >
>>> > > > > > Can you please add a description for the suggested method, what
>>> > they
>>> > > > are
>>> > > > > > going to return? It's a little unclear to me atm.
>>> > > > > >
>>> > > > > > It would also be helpful to discuss, for which use case each
>>> method
>>> > > is
>>> > > > > > useful. This might also help to identify potential gaps for
>>> which
>>> > > > > > another API might be more helpful.
>>> > > > > >
>>> > > > > > Also, we should talk about provided guarantees when using those
>>> > APIs
>>> > > > > > with regard to consistency -- not saying that we need to
>>> provide
>>> > > strong
>>> > > > > > guarantees, but he KIP should describe what user can expect.
>>> > > > > >
>>> > > > > >
>>> > > > > > -Matthias
>>> > > > > >
>>> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
>>> > > > > > > Hello, I would like to solicit review and comment on this
>>> issue
>>> > > (link
>>> > > > > > > below):
>>> > > > > > >
>>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
>>> > > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > -- Guozhang
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Richard Yu <yo...@gmail.com>.

Is this KIP close to completion? Because we could start working on the code
itself now. (Its at about this stage).

On Mon, Oct 16, 2017 at 7:37 PM, Richard Yu <yo...@gmail.com>
wrote:

> As Guozhang Wang mentioned earlier, we want to mirror the structure of
> similar Store class (namely KTable). The WindowedStore class might be
> unique in itself as it uses fetch() methods, but in my opinion, uniformity
> should be better suited for simplicity.
>
> On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xa...@confluent.io>
> wrote:
>
>> Thank you Richard! Do you or Guozhang have any thoughts on my suggestions
>> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range"
>> keyword for when we query a specific range of keys?
>>
>> Xavier
>>
>> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yo...@gmail.com>
>> wrote:
>>
>> > Thanks for the clarifications, Xavier.
>> > I have removed most of the methods except for keys() and all() which has
>> > been renamed to Guozhang Wang's suggestions.
>> >
>> > Hope this helps.
>> >
>> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xa...@confluent.io>
>> > wrote:
>> >
>> > > Thanks for the KIP Richard, this is a very useful addition!
>> > >
>> > > As far as the API changes, I just have a few comments on the methods
>> that
>> > > don't seem directly related to the KIP title, and naming of course :).
>> > > On the implementation, see my notes further down that will hopefully
>> > > clarify a few things.
>> > >
>> > > Regarding the "bonus" methods:
>> > > I agree with Guozhang that the KIP lacks proper motivation for adding
>> the
>> > > min, max, and allLatest methods.
>> > > It is also not clear to me what min and max would really mean, what
>> > > ordering do we refer to here? Are we first ordering by time, then
>> key, or
>> > > first by key, then time?
>> > > The allLatest method might be useful, but I don't really see how it
>> would
>> > > be used in practice if we have to scan the entire range of keys for
>> all
>> > the
>> > > state stores, every single time.
>> > >
>> > > Maybe we could flesh the motivation behind those extra methods, but in
>> > the
>> > > interest of time, and moving the KIP forward it might make sense to
>> file
>> > a
>> > > follow-up once we have more concrete use-cases.
>> > >
>> > > On naming:
>> > > I also agree with Guozhang that "keys()" should be renamed. It feels a
>> > bit
>> > > of a misnomer, since it not only returns keys, but also the values.
>> > >
>> > > As far as what to rename it to, I would argue we already have some
>> > > discrepancy between key-value stores using range() vs. window stores
>> > using
>> > > fetch().
>> > > I assume we called the window method "fetch" instead of "get" because
>> you
>> > > might get back more than one window for the requested key.
>> > >
>> > > If we wanted to make things consistent with both existing key-value
>> store
>> > > naming and window store naming, we could do the following:
>> > > Decide that "all" always refers to the entire range of keys,
>> independent
>> > of
>> > > the window and similarly "range" always refers to a particular range
>> of
>> > > keys, irrespective of the window.
>> > > We can then prefix methods with "fetch" to indicate that more than one
>> > > window may be returned for each key in the range.
>> > >
>> > > This would give us:
>> > > - a new fetchAll() method for all the keys, which makes it clear that
>> you
>> > > might get back the same key in different windows
>> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a
>> given
>> > > time range, again with possibly more than one window per key
>> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K,
>> long,
>> > > long)  and deprecate the old one to indicate a range of keys
>> > >
>> > > One inconsistency I noted: the "Proposed Changes" section in your KIP
>> > talks
>> > > about a "range(timeFrom, timeTo)" method, I think you meant to refer
>> to
>> > the
>> > > all(from, to) method, but I'm sure you'll fix that once we decide on
>> > > naming.
>> > >
>> > > On the implementation side:
>> > > You mentioned that caching and rocksdb store have very different
>> > key/value
>> > > structures, and while it appears to be that way on the surface, the
>> > > structure between the two is actually very similar. Keys in the cache
>> are
>> > > prefixed with a segment ID to ensure the ordering in the cache stays
>> > > consistent with the rocksdb implementation, which maintains multiple
>> > > rocksdb instances, one for each segment. So we just "artificially"
>> mirror
>> > > the segment structure in the cache.
>> > >
>> > > The reason for keeping the ordering consistent is pretty simple: keep
>> in
>> > > mind that when we query a cached window store we are effectively
>> querying
>> > > both the cache and the persistent rocksdb store at the same time,
>> merging
>> > > results from both. To make that merge as painless as possible, we
>> ensure
>> > > the ordering is consistent when querying a range of keys in both
>> stores.
>> > >
>> > > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple
>> > window
>> > > stores within a topology.
>> > >
>> > > Hope this clarifies some of the less trivial parts of caching window
>> > store.
>> > >
>> > > Cheers,
>> > > Xavier
>> > >
>> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wa...@gmail.com>
>> wrote:
>> > >
>> > > > Richard, Matthias:
>> > > >
>> > > > 0. Could you describe a bit what are the possible use cases of
>> > > `allLatest`,
>> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a
>> minimum
>> > > > necessary amount, to avoid a swamp of new APIs that no one would
>> really
>> > > use
>> > > > but just complicated the internal code base.
>> > > >
>> > > > 1. One minor comment on the other two new APIs: could we rename
>> `keys`
>> > to
>> > > > `all` and `all` to `range` to be consistent with the other store's
>> > APIs?
>> > > >
>> > > > 2. One meta comment on the implementation details: since both `keys`
>> > and
>> > > > `all` would likely touch multiple segments, we may need to use the
>> > > internal
>> > > > `SegmentIterator` class, but currently it always requires a Bytes
>> from
>> > > and
>> > > > Bytes to for its constructor. What changes we need to make for that
>> > > class?
>> > > >
>> > > >
>> > > > Guozhang
>> > > >
>> > > >
>> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <
>> yohan.richard.yu@gmail.com
>> > >
>> > > > wrote:
>> > > >
>> > > > > We should split KAFKA-4499 into several sub-issues with 4499 being
>> > the
>> > > > > parent issue.
>> > > > > Adding the implementation to CachingWindowStore,
>> RocksDBWindowStore,
>> > > etc
>> > > > > will each require the addition of a test and implementing the
>> methods
>> > > > which
>> > > > > is not trivial.
>> > > > > This way, it should be easier to manage the progress of the KIP.
>> > > > >
>> > > > >
>> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
>> > matthias@confluent.io
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks for driving this and sorry for late response. With
>> release
>> > > > > > deadline it was pretty busy lately.
>> > > > > >
>> > > > > > Can you please add a description for the suggested method, what
>> > they
>> > > > are
>> > > > > > going to return? It's a little unclear to me atm.
>> > > > > >
>> > > > > > It would also be helpful to discuss, for which use case each
>> method
>> > > is
>> > > > > > useful. This might also help to identify potential gaps for
>> which
>> > > > > > another API might be more helpful.
>> > > > > >
>> > > > > > Also, we should talk about provided guarantees when using those
>> > APIs
>> > > > > > with regard to consistency -- not saying that we need to provide
>> > > strong
>> > > > > > guarantees, but he KIP should describe what user can expect.
>> > > > > >
>> > > > > >
>> > > > > > -Matthias
>> > > > > >
>> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
>> > > > > > > Hello, I would like to solicit review and comment on this
>> issue
>> > > (link
>> > > > > > > below):
>> > > > > > >
>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -- Guozhang
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Richard Yu <yo...@gmail.com>.

As Guozhang Wang mentioned earlier, we want to mirror the structure of
similar Store class (namely KTable). The WindowedStore class might be
unique in itself as it uses fetch() methods, but in my opinion, uniformity
should be better suited for simplicity.

On Mon, Oct 16, 2017 at 11:54 AM, Xavier Léauté <xa...@confluent.io> wrote:

> Thank you Richard! Do you or Guozhang have any thoughts on my suggestions
> to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range"
> keyword for when we query a specific range of keys?
>
> Xavier
>
> On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yo...@gmail.com>
> wrote:
>
> > Thanks for the clarifications, Xavier.
> > I have removed most of the methods except for keys() and all() which has
> > been renamed to Guozhang Wang's suggestions.
> >
> > Hope this helps.
> >
> > On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xa...@confluent.io>
> > wrote:
> >
> > > Thanks for the KIP Richard, this is a very useful addition!
> > >
> > > As far as the API changes, I just have a few comments on the methods
> that
> > > don't seem directly related to the KIP title, and naming of course :).
> > > On the implementation, see my notes further down that will hopefully
> > > clarify a few things.
> > >
> > > Regarding the "bonus" methods:
> > > I agree with Guozhang that the KIP lacks proper motivation for adding
> the
> > > min, max, and allLatest methods.
> > > It is also not clear to me what min and max would really mean, what
> > > ordering do we refer to here? Are we first ordering by time, then key,
> or
> > > first by key, then time?
> > > The allLatest method might be useful, but I don't really see how it
> would
> > > be used in practice if we have to scan the entire range of keys for all
> > the
> > > state stores, every single time.
> > >
> > > Maybe we could flesh the motivation behind those extra methods, but in
> > the
> > > interest of time, and moving the KIP forward it might make sense to
> file
> > a
> > > follow-up once we have more concrete use-cases.
> > >
> > > On naming:
> > > I also agree with Guozhang that "keys()" should be renamed. It feels a
> > bit
> > > of a misnomer, since it not only returns keys, but also the values.
> > >
> > > As far as what to rename it to, I would argue we already have some
> > > discrepancy between key-value stores using range() vs. window stores
> > using
> > > fetch().
> > > I assume we called the window method "fetch" instead of "get" because
> you
> > > might get back more than one window for the requested key.
> > >
> > > If we wanted to make things consistent with both existing key-value
> store
> > > naming and window store naming, we could do the following:
> > > Decide that "all" always refers to the entire range of keys,
> independent
> > of
> > > the window and similarly "range" always refers to a particular range of
> > > keys, irrespective of the window.
> > > We can then prefix methods with "fetch" to indicate that more than one
> > > window may be returned for each key in the range.
> > >
> > > This would give us:
> > > - a new fetchAll() method for all the keys, which makes it clear that
> you
> > > might get back the same key in different windows
> > > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a
> given
> > > time range, again with possibly more than one window per key
> > > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K,
> long,
> > > long)  and deprecate the old one to indicate a range of keys
> > >
> > > One inconsistency I noted: the "Proposed Changes" section in your KIP
> > talks
> > > about a "range(timeFrom, timeTo)" method, I think you meant to refer to
> > the
> > > all(from, to) method, but I'm sure you'll fix that once we decide on
> > > naming.
> > >
> > > On the implementation side:
> > > You mentioned that caching and rocksdb store have very different
> > key/value
> > > structures, and while it appears to be that way on the surface, the
> > > structure between the two is actually very similar. Keys in the cache
> are
> > > prefixed with a segment ID to ensure the ordering in the cache stays
> > > consistent with the rocksdb implementation, which maintains multiple
> > > rocksdb instances, one for each segment. So we just "artificially"
> mirror
> > > the segment structure in the cache.
> > >
> > > The reason for keeping the ordering consistent is pretty simple: keep
> in
> > > mind that when we query a cached window store we are effectively
> querying
> > > both the cache and the persistent rocksdb store at the same time,
> merging
> > > results from both. To make that merge as painless as possible, we
> ensure
> > > the ordering is consistent when querying a range of keys in both
> stores.
> > >
> > > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple
> > window
> > > stores within a topology.
> > >
> > > Hope this clarifies some of the less trivial parts of caching window
> > store.
> > >
> > > Cheers,
> > > Xavier
> > >
> > > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> > >
> > > > Richard, Matthias:
> > > >
> > > > 0. Could you describe a bit what are the possible use cases of
> > > `allLatest`,
> > > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a
> minimum
> > > > necessary amount, to avoid a swamp of new APIs that no one would
> really
> > > use
> > > > but just complicated the internal code base.
> > > >
> > > > 1. One minor comment on the other two new APIs: could we rename
> `keys`
> > to
> > > > `all` and `all` to `range` to be consistent with the other store's
> > APIs?
> > > >
> > > > 2. One meta comment on the implementation details: since both `keys`
> > and
> > > > `all` would likely touch multiple segments, we may need to use the
> > > internal
> > > > `SegmentIterator` class, but currently it always requires a Bytes
> from
> > > and
> > > > Bytes to for its constructor. What changes we need to make for that
> > > class?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <
> yohan.richard.yu@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > We should split KAFKA-4499 into several sub-issues with 4499 being
> > the
> > > > > parent issue.
> > > > > Adding the implementation to CachingWindowStore,
> RocksDBWindowStore,
> > > etc
> > > > > will each require the addition of a test and implementing the
> methods
> > > > which
> > > > > is not trivial.
> > > > > This way, it should be easier to manage the progress of the KIP.
> > > > >
> > > > >
> > > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
> > matthias@confluent.io
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks for driving this and sorry for late response. With release
> > > > > > deadline it was pretty busy lately.
> > > > > >
> > > > > > Can you please add a description for the suggested method, what
> > they
> > > > are
> > > > > > going to return? It's a little unclear to me atm.
> > > > > >
> > > > > > It would also be helpful to discuss, for which use case each
> method
> > > is
> > > > > > useful. This might also help to identify potential gaps for which
> > > > > > another API might be more helpful.
> > > > > >
> > > > > > Also, we should talk about provided guarantees when using those
> > APIs
> > > > > > with regard to consistency -- not saying that we need to provide
> > > strong
> > > > > > guarantees, but he KIP should describe what user can expect.
> > > > > >
> > > > > >
> > > > > > -Matthias
> > > > > >
> > > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
> > > > > > > Hello, I would like to solicit review and comment on this issue
> > > (link
> > > > > > > below):
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Xavier Léauté <xa...@confluent.io>.

Thank you Richard! Do you or Guozhang have any thoughts on my suggestions
to use fetchAll() and fetchAll(timeFrom, timeTo) and reserve the "range"
keyword for when we query a specific range of keys?

Xavier

On Sat, Oct 14, 2017 at 2:32 PM Richard Yu <yo...@gmail.com>
wrote:

> Thanks for the clarifications, Xavier.
> I have removed most of the methods except for keys() and all() which has
> been renamed to Guozhang Wang's suggestions.
>
> Hope this helps.
>
> On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xa...@confluent.io>
> wrote:
>
> > Thanks for the KIP Richard, this is a very useful addition!
> >
> > As far as the API changes, I just have a few comments on the methods that
> > don't seem directly related to the KIP title, and naming of course :).
> > On the implementation, see my notes further down that will hopefully
> > clarify a few things.
> >
> > Regarding the "bonus" methods:
> > I agree with Guozhang that the KIP lacks proper motivation for adding the
> > min, max, and allLatest methods.
> > It is also not clear to me what min and max would really mean, what
> > ordering do we refer to here? Are we first ordering by time, then key, or
> > first by key, then time?
> > The allLatest method might be useful, but I don't really see how it would
> > be used in practice if we have to scan the entire range of keys for all
> the
> > state stores, every single time.
> >
> > Maybe we could flesh the motivation behind those extra methods, but in
> the
> > interest of time, and moving the KIP forward it might make sense to file
> a
> > follow-up once we have more concrete use-cases.
> >
> > On naming:
> > I also agree with Guozhang that "keys()" should be renamed. It feels a
> bit
> > of a misnomer, since it not only returns keys, but also the values.
> >
> > As far as what to rename it to, I would argue we already have some
> > discrepancy between key-value stores using range() vs. window stores
> using
> > fetch().
> > I assume we called the window method "fetch" instead of "get" because you
> > might get back more than one window for the requested key.
> >
> > If we wanted to make things consistent with both existing key-value store
> > naming and window store naming, we could do the following:
> > Decide that "all" always refers to the entire range of keys, independent
> of
> > the window and similarly "range" always refers to a particular range of
> > keys, irrespective of the window.
> > We can then prefix methods with "fetch" to indicate that more than one
> > window may be returned for each key in the range.
> >
> > This would give us:
> > - a new fetchAll() method for all the keys, which makes it clear that you
> > might get back the same key in different windows
> > - a new fetchAll(timeFrom, timeTo) method to get all the keys in a given
> > time range, again with possibly more than one window per key
> > - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K, long,
> > long)  and deprecate the old one to indicate a range of keys
> >
> > One inconsistency I noted: the "Proposed Changes" section in your KIP
> talks
> > about a "range(timeFrom, timeTo)" method, I think you meant to refer to
> the
> > all(from, to) method, but I'm sure you'll fix that once we decide on
> > naming.
> >
> > On the implementation side:
> > You mentioned that caching and rocksdb store have very different
> key/value
> > structures, and while it appears to be that way on the surface, the
> > structure between the two is actually very similar. Keys in the cache are
> > prefixed with a segment ID to ensure the ordering in the cache stays
> > consistent with the rocksdb implementation, which maintains multiple
> > rocksdb instances, one for each segment. So we just "artificially" mirror
> > the segment structure in the cache.
> >
> > The reason for keeping the ordering consistent is pretty simple: keep in
> > mind that when we query a cached window store we are effectively querying
> > both the cache and the persistent rocksdb store at the same time, merging
> > results from both. To make that merge as painless as possible, we ensure
> > the ordering is consistent when querying a range of keys in both stores.
> >
> > Also keep in mind CompositeReadonlyWindowStore, which wraps multiple
> window
> > stores within a topology.
> >
> > Hope this clarifies some of the less trivial parts of caching window
> store.
> >
> > Cheers,
> > Xavier
> >
> > On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wa...@gmail.com> wrote:
> >
> > > Richard, Matthias:
> > >
> > > 0. Could you describe a bit what are the possible use cases of
> > `allLatest`,
> > > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a minimum
> > > necessary amount, to avoid a swamp of new APIs that no one would really
> > use
> > > but just complicated the internal code base.
> > >
> > > 1. One minor comment on the other two new APIs: could we rename `keys`
> to
> > > `all` and `all` to `range` to be consistent with the other store's
> APIs?
> > >
> > > 2. One meta comment on the implementation details: since both `keys`
> and
> > > `all` would likely touch multiple segments, we may need to use the
> > internal
> > > `SegmentIterator` class, but currently it always requires a Bytes from
> > and
> > > Bytes to for its constructor. What changes we need to make for that
> > class?
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <yohan.richard.yu@gmail.com
> >
> > > wrote:
> > >
> > > > We should split KAFKA-4499 into several sub-issues with 4499 being
> the
> > > > parent issue.
> > > > Adding the implementation to CachingWindowStore, RocksDBWindowStore,
> > etc
> > > > will each require the addition of a test and implementing the methods
> > > which
> > > > is not trivial.
> > > > This way, it should be easier to manage the progress of the KIP.
> > > >
> > > >
> > > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <
> matthias@confluent.io
> > >
> > > > wrote:
> > > >
> > > > > Thanks for driving this and sorry for late response. With release
> > > > > deadline it was pretty busy lately.
> > > > >
> > > > > Can you please add a description for the suggested method, what
> they
> > > are
> > > > > going to return? It's a little unclear to me atm.
> > > > >
> > > > > It would also be helpful to discuss, for which use case each method
> > is
> > > > > useful. This might also help to identify potential gaps for which
> > > > > another API might be more helpful.
> > > > >
> > > > > Also, we should talk about provided guarantees when using those
> APIs
> > > > > with regard to consistency -- not saying that we need to provide
> > strong
> > > > > guarantees, but he KIP should describe what user can expect.
> > > > >
> > > > >
> > > > > -Matthias
> > > > >
> > > > > On 9/24/17 8:11 PM, Richard Yu wrote:
> > > > > > Hello, I would like to solicit review and comment on this issue
> > (link
> > > > > > below):
> > > > > >
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Richard Yu <yo...@gmail.com>.

Thanks for the clarifications, Xavier.
I have removed most of the methods except for keys() and all() which has
been renamed to Guozhang Wang's suggestions.

Hope this helps.

On Fri, Oct 13, 2017 at 3:28 PM, Xavier Léauté <xa...@confluent.io> wrote:

> Thanks for the KIP Richard, this is a very useful addition!
>
> As far as the API changes, I just have a few comments on the methods that
> don't seem directly related to the KIP title, and naming of course :).
> On the implementation, see my notes further down that will hopefully
> clarify a few things.
>
> Regarding the "bonus" methods:
> I agree with Guozhang that the KIP lacks proper motivation for adding the
> min, max, and allLatest methods.
> It is also not clear to me what min and max would really mean, what
> ordering do we refer to here? Are we first ordering by time, then key, or
> first by key, then time?
> The allLatest method might be useful, but I don't really see how it would
> be used in practice if we have to scan the entire range of keys for all the
> state stores, every single time.
>
> Maybe we could flesh the motivation behind those extra methods, but in the
> interest of time, and moving the KIP forward it might make sense to file a
> follow-up once we have more concrete use-cases.
>
> On naming:
> I also agree with Guozhang that "keys()" should be renamed. It feels a bit
> of a misnomer, since it not only returns keys, but also the values.
>
> As far as what to rename it to, I would argue we already have some
> discrepancy between key-value stores using range() vs. window stores using
> fetch().
> I assume we called the window method "fetch" instead of "get" because you
> might get back more than one window for the requested key.
>
> If we wanted to make things consistent with both existing key-value store
> naming and window store naming, we could do the following:
> Decide that "all" always refers to the entire range of keys, independent of
> the window and similarly "range" always refers to a particular range of
> keys, irrespective of the window.
> We can then prefix methods with "fetch" to indicate that more than one
> window may be returned for each key in the range.
>
> This would give us:
> - a new fetchAll() method for all the keys, which makes it clear that you
> might get back the same key in different windows
> - a new fetchAll(timeFrom, timeTo) method to get all the keys in a given
> time range, again with possibly more than one window per key
> - and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K, long,
> long)  and deprecate the old one to indicate a range of keys
>
> One inconsistency I noted: the "Proposed Changes" section in your KIP talks
> about a "range(timeFrom, timeTo)" method, I think you meant to refer to the
> all(from, to) method, but I'm sure you'll fix that once we decide on
> naming.
>
> On the implementation side:
> You mentioned that caching and rocksdb store have very different key/value
> structures, and while it appears to be that way on the surface, the
> structure between the two is actually very similar. Keys in the cache are
> prefixed with a segment ID to ensure the ordering in the cache stays
> consistent with the rocksdb implementation, which maintains multiple
> rocksdb instances, one for each segment. So we just "artificially" mirror
> the segment structure in the cache.
>
> The reason for keeping the ordering consistent is pretty simple: keep in
> mind that when we query a cached window store we are effectively querying
> both the cache and the persistent rocksdb store at the same time, merging
> results from both. To make that merge as painless as possible, we ensure
> the ordering is consistent when querying a range of keys in both stores.
>
> Also keep in mind CompositeReadonlyWindowStore, which wraps multiple window
> stores within a topology.
>
> Hope this clarifies some of the less trivial parts of caching window store.
>
> Cheers,
> Xavier
>
> On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wa...@gmail.com> wrote:
>
> > Richard, Matthias:
> >
> > 0. Could you describe a bit what are the possible use cases of
> `allLatest`,
> > `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a minimum
> > necessary amount, to avoid a swamp of new APIs that no one would really
> use
> > but just complicated the internal code base.
> >
> > 1. One minor comment on the other two new APIs: could we rename `keys` to
> > `all` and `all` to `range` to be consistent with the other store's APIs?
> >
> > 2. One meta comment on the implementation details: since both `keys` and
> > `all` would likely touch multiple segments, we may need to use the
> internal
> > `SegmentIterator` class, but currently it always requires a Bytes from
> and
> > Bytes to for its constructor. What changes we need to make for that
> class?
> >
> >
> > Guozhang
> >
> >
> > On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <yo...@gmail.com>
> > wrote:
> >
> > > We should split KAFKA-4499 into several sub-issues with 4499 being the
> > > parent issue.
> > > Adding the implementation to CachingWindowStore, RocksDBWindowStore,
> etc
> > > will each require the addition of a test and implementing the methods
> > which
> > > is not trivial.
> > > This way, it should be easier to manage the progress of the KIP.
> > >
> > >
> > > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <matthias@confluent.io
> >
> > > wrote:
> > >
> > > > Thanks for driving this and sorry for late response. With release
> > > > deadline it was pretty busy lately.
> > > >
> > > > Can you please add a description for the suggested method, what they
> > are
> > > > going to return? It's a little unclear to me atm.
> > > >
> > > > It would also be helpful to discuss, for which use case each method
> is
> > > > useful. This might also help to identify potential gaps for which
> > > > another API might be more helpful.
> > > >
> > > > Also, we should talk about provided guarantees when using those APIs
> > > > with regard to consistency -- not saying that we need to provide
> strong
> > > > guarantees, but he KIP should describe what user can expect.
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 9/24/17 8:11 PM, Richard Yu wrote:
> > > > > Hello, I would like to solicit review and comment on this issue
> (link
> > > > > below):
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> > > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Xavier Léauté <xa...@confluent.io>.

Thanks for the KIP Richard, this is a very useful addition!

As far as the API changes, I just have a few comments on the methods that
don't seem directly related to the KIP title, and naming of course :).
On the implementation, see my notes further down that will hopefully
clarify a few things.

Regarding the "bonus" methods:
I agree with Guozhang that the KIP lacks proper motivation for adding the
min, max, and allLatest methods.
It is also not clear to me what min and max would really mean, what
ordering do we refer to here? Are we first ordering by time, then key, or
first by key, then time?
The allLatest method might be useful, but I don't really see how it would
be used in practice if we have to scan the entire range of keys for all the
state stores, every single time.

Maybe we could flesh the motivation behind those extra methods, but in the
interest of time, and moving the KIP forward it might make sense to file a
follow-up once we have more concrete use-cases.

On naming:
I also agree with Guozhang that "keys()" should be renamed. It feels a bit
of a misnomer, since it not only returns keys, but also the values.

As far as what to rename it to, I would argue we already have some
discrepancy between key-value stores using range() vs. window stores using
fetch().
I assume we called the window method "fetch" instead of "get" because you
might get back more than one window for the requested key.

If we wanted to make things consistent with both existing key-value store
naming and window store naming, we could do the following:
Decide that "all" always refers to the entire range of keys, independent of
the window and similarly "range" always refers to a particular range of
keys, irrespective of the window.
We can then prefix methods with "fetch" to indicate that more than one
window may be returned for each key in the range.

This would give us:
- a new fetchAll() method for all the keys, which makes it clear that you
might get back the same key in different windows
- a new fetchAll(timeFrom, timeTo) method to get all the keys in a given
time range, again with possibly more than one window per key
- and we'd have to rename fetch(K,K,long, long) to fetchRange(K, K, long,
long)  and deprecate the old one to indicate a range of keys

One inconsistency I noted: the "Proposed Changes" section in your KIP talks
about a "range(timeFrom, timeTo)" method, I think you meant to refer to the
all(from, to) method, but I'm sure you'll fix that once we decide on naming.

On the implementation side:
You mentioned that caching and rocksdb store have very different key/value
structures, and while it appears to be that way on the surface, the
structure between the two is actually very similar. Keys in the cache are
prefixed with a segment ID to ensure the ordering in the cache stays
consistent with the rocksdb implementation, which maintains multiple
rocksdb instances, one for each segment. So we just "artificially" mirror
the segment structure in the cache.

The reason for keeping the ordering consistent is pretty simple: keep in
mind that when we query a cached window store we are effectively querying
both the cache and the persistent rocksdb store at the same time, merging
results from both. To make that merge as painless as possible, we ensure
the ordering is consistent when querying a range of keys in both stores.

Also keep in mind CompositeReadonlyWindowStore, which wraps multiple window
stores within a topology.

Hope this clarifies some of the less trivial parts of caching window store.

Cheers,
Xavier

On Sun, Oct 8, 2017 at 9:21 PM Guozhang Wang <wa...@gmail.com> wrote:

> Richard, Matthias:
>
> 0. Could you describe a bit what are the possible use cases of `allLatest`,
> `minKey` and `maxKey`? I'd prefer keeping the APIs to add at a minimum
> necessary amount, to avoid a swamp of new APIs that no one would really use
> but just complicated the internal code base.
>
> 1. One minor comment on the other two new APIs: could we rename `keys` to
> `all` and `all` to `range` to be consistent with the other store's APIs?
>
> 2. One meta comment on the implementation details: since both `keys` and
> `all` would likely touch multiple segments, we may need to use the internal
> `SegmentIterator` class, but currently it always requires a Bytes from and
> Bytes to for its constructor. What changes we need to make for that class?
>
>
> Guozhang
>
>
> On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <yo...@gmail.com>
> wrote:
>
> > We should split KAFKA-4499 into several sub-issues with 4499 being the
> > parent issue.
> > Adding the implementation to CachingWindowStore, RocksDBWindowStore, etc
> > will each require the addition of a test and implementing the methods
> which
> > is not trivial.
> > This way, it should be easier to manage the progress of the KIP.
> >
> >
> > On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> > > Thanks for driving this and sorry for late response. With release
> > > deadline it was pretty busy lately.
> > >
> > > Can you please add a description for the suggested method, what they
> are
> > > going to return? It's a little unclear to me atm.
> > >
> > > It would also be helpful to discuss, for which use case each method is
> > > useful. This might also help to identify potential gaps for which
> > > another API might be more helpful.
> > >
> > > Also, we should talk about provided guarantees when using those APIs
> > > with regard to consistency -- not saying that we need to provide strong
> > > guarantees, but he KIP should describe what user can expect.
> > >
> > >
> > > -Matthias
> > >
> > > On 9/24/17 8:11 PM, Richard Yu wrote:
> > > > Hello, I would like to solicit review and comment on this issue (link
> > > > below):
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> > > >
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Guozhang Wang <wa...@gmail.com>.

Richard, Matthias:

0. Could you describe a bit what are the possible use cases of `allLatest`,
`minKey` and `maxKey`? I'd prefer keeping the APIs to add at a minimum
necessary amount, to avoid a swamp of new APIs that no one would really use
but just complicated the internal code base.

1. One minor comment on the other two new APIs: could we rename `keys` to
`all` and `all` to `range` to be consistent with the other store's APIs?

2. One meta comment on the implementation details: since both `keys` and
`all` would likely touch multiple segments, we may need to use the internal
`SegmentIterator` class, but currently it always requires a Bytes from and
Bytes to for its constructor. What changes we need to make for that class?

Guozhang

On Thu, Oct 5, 2017 at 4:42 PM, Richard Yu <yo...@gmail.com>
wrote:

> We should split KAFKA-4499 into several sub-issues with 4499 being the
> parent issue.
> Adding the implementation to CachingWindowStore, RocksDBWindowStore, etc
> will each require the addition of a test and implementing the methods which
> is not trivial.
> This way, it should be easier to manage the progress of the KIP.
>
>
> On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Thanks for driving this and sorry for late response. With release
> > deadline it was pretty busy lately.
> >
> > Can you please add a description for the suggested method, what they are
> > going to return? It's a little unclear to me atm.
> >
> > It would also be helpful to discuss, for which use case each method is
> > useful. This might also help to identify potential gaps for which
> > another API might be more helpful.
> >
> > Also, we should talk about provided guarantees when using those APIs
> > with regard to consistency -- not saying that we need to provide strong
> > guarantees, but he KIP should describe what user can expect.
> >
> >
> > -Matthias
> >
> > On 9/24/17 8:11 PM, Richard Yu wrote:
> > > Hello, I would like to solicit review and comment on this issue (link
> > > below):
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> > >
> >
> >
>

-- 
-- Guozhang

Re: [DISCUSS] KIP-205: Add getAllKeys() API to ReadOnlyWindowStore

Posted by Richard Yu <yo...@gmail.com>.

We should split KAFKA-4499 into several sub-issues with 4499 being the
parent issue.
Adding the implementation to CachingWindowStore, RocksDBWindowStore, etc
will each require the addition of a test and implementing the methods which
is not trivial.
This way, it should be easier to manage the progress of the KIP.


On Thu, Oct 5, 2017 at 2:58 PM, Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for driving this and sorry for late response. With release
> deadline it was pretty busy lately.
>
> Can you please add a description for the suggested method, what they are
> going to return? It's a little unclear to me atm.
>
> It would also be helpful to discuss, for which use case each method is
> useful. This might also help to identify potential gaps for which
> another API might be more helpful.
>
> Also, we should talk about provided guarantees when using those APIs
> with regard to consistency -- not saying that we need to provide strong
> guarantees, but he KIP should describe what user can expect.
>
>
> -Matthias
>
> On 9/24/17 8:11 PM, Richard Yu wrote:
> > Hello, I would like to solicit review and comment on this issue (link
> > below):
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 205%3A+Add+getAllKeys%28%29+API+to+ReadOnlyWindowStore
> >
>
>