You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by John Roesler <jo...@confluent.io> on 2019/05/06 15:37:22 UTC

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Thanks all (or range? ;) ) for the discussion. Good points all around.

Although I see the allure of naming the metrics the same as the things
they're measuring, it seems not to be perfect. Seconding Matthias's latter
thought, I think it's likely you'd want to measure the method calls
independently, since the different range variants would have wildly
different characteristics, which could then lead you to want to orient the
storage differently to support particular use cases.

Pointing out some structural characteristics (I know you all know this
stuff, I'm just constructing a table for analysis):
* Java supports method-name overloading. *Different* methods can have the
same names, distinguished by arg lists; it doesn't change the fact that
they are different methods.
* Metrics does not support metric-name overloading, but metric names do
have some structure we could exploit, if you consider the tags.

It seems to me that there's actually more domain mismatch if we just have
one metric named "range", since (e.g., in the SessionStore proposal above)
the Java API has *four* methods named "range".

Two potential solutions I see:
* hierarchical metric names: "range-single-key-all-time",
"range-key-range-all-time", "range-single-key-time-range",
"range-key-range-time-range", maybe with better names... I'm not the best
at this stuff. Hopefully, you see the point, though... they all start with
"range", which provides the association to the method, and all have a
suffix which identifies the overload being measured.
* tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
{"variant": "key-range-all-time"}, "range" {"variant":
"single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
you could even split the tags up semantically, but my instinct says that
that would just make it harder to digest the metrics later on.

Just some ideas.
-John

On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
wrote:

> Thanks for the input Guozhang. I was not aware of those dependencies.
>
> It might be good to align this KIP with the metrics cleanup. Not sure
> atm, if we should use different metric names for different overloads,
> even if those have the same method name?
>
> If we rename all method to `range()` and use the same metric name for
> all, one could argue that this is still fine, because the metric
> collects how often a range query is executed (regardless of the range
> itself).
>
> On the other hand, this would basically be a "roll up". It could still
> be valuable to distinguish between single-key-time-range,
> key-range-time-range, and all-range queries. Users could still aggregate
> those later if they are not interested in the details, while it's not
> possible for user to split a pre-aggregated metric into it's component.
>
>
> Input from others might be helpful here, too.
>
>
> -Matthias
>
> On 4/11/19 6:00 PM, Guozhang Wang wrote:
> > While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
> > realized there are a bunch of issues on metric names v.s. function names,
> > e.g. some function named `fetchAll` are actually measure with `fetch`,
> etc.
> > So in that KIP I proposed to make the function name aligned with metrics
> > name. So suppose we rename the functions from `fetch` to `range` I'd
> > suggest we make this change as part of KIP-444 as well. Note that it
> means
> > different functions with the same name `range` will be measured under a
> > single metric then.
> >
> > But still for function named `all` it will be measured under a separate
> > metric named `all`, so I'm just clarifying with you if that's the
> intention.
> >
> >
> > Guozhang
> >
> > On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> >> I did not see a reason to rename `all()` to `range()`. `all()` does not
> >> take any parameters to limit a range and is a good name IMHO. But I am
> >> not married to keep `all()` and if we think we should rename it, too, I
> >> am fine with it.
> >>
> >> Not sure what connection you make to metrics though. Can you elaborate?
> >>
> >>
> >> Would be interested to hear others opinions on this, too.
> >>
> >>
> >> -Matthias
> >>
> >> On 4/11/19 8:38 AM, Guozhang Wang wrote:
> >>> I like the renaming, since it also aligns with our metrics cleanup
> >>> (KIP-444) which touches upon the store level metrics as well.
> >>>
> >>> One question: you seems still suggesting to keep "all" with the current
> >>> name (and also using a separate metric for it), what's the difference
> >>> between this one and other "range" functions?
> >>>
> >>>
> >>> Guozhang
> >>>
> >>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <matthias@confluent.io
> >
> >>> wrote:
> >>>
> >>>> Thanks for the input.
> >>>>
> >>>>>> Just to clarify the naming conflicts is between the newly added
> >> function
> >>>>>> and the old functions that we want to deprecate / remove right? The
> >>>>
> >>>> Yes, the conflict is just fort the existing `fetch()` methods for
> which
> >>>> we want to change the return type.
> >>>>
> >>>> IMHO, we should not make a breaking change in a minor release. Thus,
> we
> >>>> could either only deprecate those fetch methods that return
> >>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
> >>>>
> >>>> Or we, follow the renaming path. No get a clean renaming, we need to
> >>>> consider all methods that are called "fetch":
> >>>>
> >>>> ReadOnlyWindowStore:
> >>>>
> >>>>> V fetch(K, long)
> >>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
> >>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>
> >>>> WindowStore:
> >>>>
> >>>>> WindowStoreIterator<V> fetch(K, long, long)
> >>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
> >>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
> >>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>
> >>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant) that
> >>>> fetch over all keys.
> >>>>
> >>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
> rename
> >>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason to
> >>>> have range()/rangeAll().
> >>>>
> >>>> If we do this, we might consider to rename methods for SessionStore,
> >>>> too. There is
> >>>>
> >>>>> ReadOnlySessionStore#fetch(K)
> >>>>> ReadOnlySessionStore#fetch(K, K)
> >>>>> SessionStore#findSessions(K, long, long)
> >>>>> SessionStore#findSessions(K, K, long, long)
> >>>>> SessionStore#fetchSession(K, long, long);
> >>>>
> >>>> Consistent renaming might be:
> >>>>
> >>>> ReadOnlySessionStore#range(K)
> >>>> ReadOnlySessionStore#range(K, K)
> >>>> SessionStore#range(K, long, long)
> >>>> SessionStore#range(K, K, long, long)
> >>>> SessionStore#get(K, long, long);
> >>>>
> >>>>
> >>>> Not sure if extensive renaming might have too big of an impact.
> However,
> >>>> `range()` and `get()` might actually be better names than `fetch()`,
> >>>> thus, it might also provide some additional value.
> >>>>
> >>>> Thoughts?
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
> >>>>> Hello Matthias,
> >>>>>
> >>>>> Just to clarify the naming conflicts is between the newly added
> >> function
> >>>>> and the old functions that we want to deprecate / remove right? The
> >>>>> existing ones have different signatures with parameters so that they
> >>>> should
> >>>>> not have conflicts.
> >>>>>
> >>>>> I was thinking about just make the change directly without
> deprecating
> >>>>> existing ones which would require users of 2.3 to make code changes
> --
> >>>> this
> >>>>> code change looks reasonably straight-forward to me and not much
> worth
> >>>>> deferring to later when the deprecated ones are removed.
> >>>>>
> >>>>> On the other hand, just deprecating "WindowIterator" without add new
> >>>>> functions seems not very useful for users either since it is only
> used
> >> as
> >>>>> an indicator but users cannot make code changes during this phase
> >>>> anyways,
> >>>>> so it is still a `one-cut` deal when we eventually remove the
> >> deprecated
> >>>>> ones and add the new one.
> >>>>>
> >>>>> Hence I'm slightly inclining to trade compatibility and replace it
> with
> >>>> new
> >>>>> functions in one release, but if people have a good idea of the
> >> renaming
> >>>>> approach (I do not have a good one on top of my head :) I can also be
> >>>>> convinced that way.
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
> >> matthias@confluent.io>
> >>>>> wrote:
> >>>>>
> >>>>>> I am open to change the return type to
> >>>>>>
> >>>>>> KeyValueIterator<Windowed<K>, V>
> >>>>>>
> >>>>>> However, this requires to rename
> >>>>>>
> >>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
> >>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
> >>>>>>
> >>>>>> to avoid naming conflicts.
> >>>>>>
> >>>>>> What new name would you suggest? The existing methods are called
> >>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
> >>>>>>
> >>>>>> While I think it would be good to get fully aligned return types, I
> am
> >>>>>> not sure how we can get aligned method names (without renaming all
> of
> >>>>>> them...)? If we think it's worth to rename all to get this cleaned
> >> up, I
> >>>>>> am no opposed.
> >>>>>>
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>>>
> >>>>>> -Matthias
> >>>>>>
> >>>>>>
> >>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
> >>>>>>> I was thinking about changing the return type even, to
> >>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
> >> about
> >>>>>> the
> >>>>>>> key typed `Long` (Streams javadoc today did not explain it clearly
> >>>>>> either),
> >>>>>>> note it is not backward compatible at all.
> >>>>>>>
> >>>>>>> Personally I'd prefer to just deprecate the API and new new ones
> that
> >>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
> >> people
> >>>>>> felt
> >>>>>>> it is too intrusive for compatibility I can be convinced with
> >>>>>>> `KeyValueIterator<Long, V>` as well.
> >>>>>>>
> >>>>>>> Guozhang
> >>>>>>>
> >>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
> >>>>>> sophie@confluent.io>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I remember thinking this while working on window stores, am
> >> definitely
> >>>>>> for
> >>>>>>>> it.
> >>>>>>>>
> >>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <jo...@confluent.io>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>>> Sounds great to me. Thanks, Matthias!
> >>>>>>>>> -John
> >>>>>>>>>
> >>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
> >>>>>> matthias@confluent.io>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> I would like to propose KIP-439 to deprecate interface
> >>>>>>>>>> `WindowStoreIterator`.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
> >>>>>>>>>>
> >>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> -Matthias
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by "Matthias J. Sax" <ma...@confluent.io>.
I have performance concerns about this proposal, because each time a
window/session store is accessed, a new (unnecessary) object needs to be
created, and accessing the store in the processors is on the hot code path.


-Matthias


On 6/20/19 10:57 AM, John Roesler wrote:
> Hi again, all,
> 
> After wrestling with some other issues around the window interface,
> I'd propose that we consider normalizing the WindowStore (and
> SessionStore) interfaces with respect to KeyValueStore. We can't
> actually make the classes related because they'll clash on the
> deprecated methods, but we can align them. Doing so would be an
> ergonomic advantage, since all our provided store interfaces would
> have the same "shape".
> 
> Specifically, this means that we would deprecate any method that takes
> a raw "K key, long windowStartTime" or returns a raw "K key", and
> instead always take as arguments Windowed<K> keys and also return
> Windowed<K> results. This proposal already comes close to this, by
> re-using the KeyValueIterator<Windowed<K, V>>, so it would just be a
> few tweaks to "perfect" it.
> 
> What do you think?
> 
> Thanks,
> -John
> 
> On Tue, May 28, 2019 at 2:58 PM Matthias J. Sax <ma...@confluent.io> wrote:
>>
>> Let's see what other think about (2)
>>
>>
>>
>> (3) Interesting. My reasoning follows the code:
>>
>> For example, `RocksDbKeyValueBytesStoreSupplier#metricsScope()`returns
>> "rocksdb-state" and this is concatenated later in `MeteredKeyValueStore`
>> that adds a tag
>>
>>   key:       metricScope + "-id"
>>   value:     name()                  // this is the store name
>>
>> Hence, the `-state` part comes from the supplier and thus it seems to be
>> part of `[store-type]` (ie, `store-type` == metricScope()` -- not sure
>> if this interpretation holds).
>>
>> If it's not part of `[store-type]` the supplier should not return it as
>> `metricScope()` IMHO, but the `MeteredKeyValueStore` should add it
>> together with `-id` suffix from my understanding.
>>
>> Thoughts?
>>
>>
>>
>> (5) Renaming to `streams` might imply that we need to rename _all_
>> metrics, not just the store metrics that are affected, to avoid
>> inconsistencies.
>>
>> If we really want to rename it, I would rather suggest to do it as part
>> of KIP-444, that is a larger rework anyway. It seems to be out of scope
>> for this KIP.
>>
>>
>> -Matthias
>>
>>
>> On 5/28/19 12:30 PM, Bruno Cadonna wrote:
>>> Hi Matthias,
>>>
>>> 2)
>>> Yes, this is how I meant it.
>>> I am still in favour of `value-by-...` instead of `get`, because it
>>> gives you immediately the meaning of the metric without the need to
>>> know the API of the stores.
>>>
>>> 3)
>>> I asked to avoid misunderstandings because in KIP-444, `-state-` is
>>> stated explicitly in the tag.
>>>
>>> 5)
>>> The question was not about correctness. I know that we use `stream` in
>>> the metrics group. The question was rather if we want to change it.
>>> The streams metrics interface is called `StreamsMetrics`. I understand
>>> that this has low priority. Just wanted to mention it.
>>>
>>> Best,
>>> Bruno
>>>
>>>
>>> On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax <ma...@confluent.io> wrote:
>>>>
>>>> Thanks Bruno.
>>>>
>>>> I updated the KIP without changing the metric name yet -- want to
>>>> discuss this first.
>>>>
>>>>
>>>>
>>>> 1) I am also ok to keep `all()`
>>>>
>>>>
>>>> 2) That's a good idea. To make sure I understand your suggestion
>>>> correctly, below the mapping between metric name an methods.
>>>> (Some metric names are use by multiple stores):
>>>>
>>>> (ReadOnly)KeyValueStore:
>>>>
>>>>>> value-by-key : ReadOnlyKeyValueStore#get(K key)
>>>>
>>>> -> replaces `get` metric
>>>>
>>>> I am actually not sure if I like this. Just keeping `get` instead of
>>>> `value-by-key`, `value-by-key-and-time` seems more reasonable to me.
>>>>
>>>>
>>>>
>>>> (ReadOnly)WindowStore:
>>>>
>>>>>> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long windowStartTime)
>>>>
>>>> I think a simple `get` might be better
>>>>
>>>>>> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant from, Instant to)
>>>>>> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long toTime)
>>>>
>>>>>> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K to, Instant from, Instant to)
>>>>>> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long fromTime, long toTime)
>>>>
>>>>>> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to)
>>>>>> range-by-time-range : WindowStore#range(long fromTime, long toTime)
>>>>
>>>>>> range-all : ReadOnlyWindowStore#all()
>>>>
>>>>
>>>> (ReadOnly)SessionStore:
>>>>
>>>>>> range-by-key : ReadOnlySessionStore#range(K key)
>>>>
>>>>>> range-by-key-range : ReadOnlySessionStore#range(K from, K to)
>>>>>> range-by-key-and-time-range : SessionStore#range(K key, long earliestSessionEndTime, long latestSessionStartTime)
>>>>
>>>>>> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, long earliestSessionEndTime, long latestSessionStartTime)
>>>>
>>>>>> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, long sessionEndTime)
>>>>
>>>> I think a simple `get` might be better
>>>>
>>>>>> range-all : ReadOnlyKeyValueStore#all()
>>>>
>>>>
>>>>
>>>>
>>>> 3) the `state-` part is already contained in `[storeType]` do I think
>>>> it's correct as-is
>>>>
>>>>
>>>> 4) Ack. Fixed.
>>>>
>>>>
>>>> 5) I think `stream` (plural) is correct. Cf
>>>> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 5/28/19 3:26 AM, Bruno Cadonna wrote:
>>>>> Hi all,
>>>>>
>>>>> My comments on this KIP:
>>>>>
>>>>> 1. I would use `all()` instead of `range()` because the functionality
>>>>> is immediately clear without the need to look at the parameter list.
>>>>>
>>>>> 2. I would decouple method names from metrics name, because this
>>>>> allows us to change one naming independently from the other in the
>>>>> future. From the previous discussion on this thread, I have also the
>>>>> feeling that the naming requirements differ between Java code and
>>>>> metrics.
>>>>>
>>>>> My proposal for the metric names is the following:
>>>>>
>>>>> value-by-key
>>>>> range-by-key
>>>>> value-by-key-and-time
>>>>> range-by-key-range
>>>>> range-by-time-range
>>>>> range-by-key-and-time-range
>>>>> range-by-key-range-and-time-range
>>>>> range-all
>>>>>
>>>>> I omitted the metric types like latency-avg, etc. The first word
>>>>> denotes whether the query is a point query (`value`) or a range query
>>>>> (`range`). The words after the `by` denote the parameter types of the
>>>>> query. `range-all` is a special case. I hope, I did not miss any.
>>>>> We could also prefix the above names with `get-` if you think, it
>>>>> would make the names clearer.
>>>>>
>>>>> 3. Should `[storeType]-id` be `[storeType]-state-id`?
>>>>>
>>>>> 4. Some method signatures in the KIP under comment `// new` have still
>>>>> the `@Deprecated` annotation. Copy&Paste error?
>>>>>
>>>>> 5. Should `type = stream-[storeType]-metrics` be `type =
>>>>> streams-[storeType]-metrics` or do we need to keep `stream` for
>>>>> historical/backward-compatibility reasons?
>>>>>
>>>>> Best,
>>>>> Bruno
>>>>>
>>>>> On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <ma...@confluent.io> wrote:
>>>>>>
>>>>>> Thanks John and Guozhang.
>>>>>>
>>>>>> I also lean towards different metric names.
>>>>>>
>>>>>> While updating the KIP, I also realized that the whole store API for
>>>>>> window and session store is actually rather inconsistent. Hence, I
>>>>>> extended the scope of this KIP to cleanup both interfaces and changed
>>>>>> the KIP name to
>>>>>>
>>>>>>    "Cleanup built-in Store interfaces"
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
>>>>>>
>>>>>> I also included a small change to KeyValueStore to align all built-in
>>>>>> stores holistically.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Looking forward to your feedback.
>>>>>>
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/7/19 1:19 AM, Guozhang Wang wrote:
>>>>>>> Thanks John.
>>>>>>>
>>>>>>> I think I'm convinced about not collapsing the metrics measuring for
>>>>>>> various function calls. Regarding the possible solution I'm a bit inclined
>>>>>>> to just distinguish on the metric names directly over on tags: since our
>>>>>>> current metrics naming are a tad messed up anyways, fixing them in one shot
>>>>>>> as a breaking change sounds reasonable to me.
>>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>>
>>>>>>> On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:
>>>>>>>
>>>>>>>> Thanks all (or range? ;) ) for the discussion. Good points all around.
>>>>>>>>
>>>>>>>> Although I see the allure of naming the metrics the same as the things
>>>>>>>> they're measuring, it seems not to be perfect. Seconding Matthias's latter
>>>>>>>> thought, I think it's likely you'd want to measure the method calls
>>>>>>>> independently, since the different range variants would have wildly
>>>>>>>> different characteristics, which could then lead you to want to orient the
>>>>>>>> storage differently to support particular use cases.
>>>>>>>>
>>>>>>>> Pointing out some structural characteristics (I know you all know this
>>>>>>>> stuff, I'm just constructing a table for analysis):
>>>>>>>> * Java supports method-name overloading. *Different* methods can have the
>>>>>>>> same names, distinguished by arg lists; it doesn't change the fact that
>>>>>>>> they are different methods.
>>>>>>>> * Metrics does not support metric-name overloading, but metric names do
>>>>>>>> have some structure we could exploit, if you consider the tags.
>>>>>>>>
>>>>>>>> It seems to me that there's actually more domain mismatch if we just have
>>>>>>>> one metric named "range", since (e.g., in the SessionStore proposal above)
>>>>>>>> the Java API has *four* methods named "range".
>>>>>>>>
>>>>>>>> Two potential solutions I see:
>>>>>>>> * hierarchical metric names: "range-single-key-all-time",
>>>>>>>> "range-key-range-all-time", "range-single-key-time-range",
>>>>>>>> "range-key-range-time-range", maybe with better names... I'm not the best
>>>>>>>> at this stuff. Hopefully, you see the point, though... they all start with
>>>>>>>> "range", which provides the association to the method, and all have a
>>>>>>>> suffix which identifies the overload being measured.
>>>>>>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
>>>>>>>> {"variant": "key-range-all-time"}, "range" {"variant":
>>>>>>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
>>>>>>>> you could even split the tags up semantically, but my instinct says that
>>>>>>>> that would just make it harder to digest the metrics later on.
>>>>>>>>
>>>>>>>> Just some ideas.
>>>>>>>> -John
>>>>>>>>
>>>>>>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the input Guozhang. I was not aware of those dependencies.
>>>>>>>>>
>>>>>>>>> It might be good to align this KIP with the metrics cleanup. Not sure
>>>>>>>>> atm, if we should use different metric names for different overloads,
>>>>>>>>> even if those have the same method name?
>>>>>>>>>
>>>>>>>>> If we rename all method to `range()` and use the same metric name for
>>>>>>>>> all, one could argue that this is still fine, because the metric
>>>>>>>>> collects how often a range query is executed (regardless of the range
>>>>>>>>> itself).
>>>>>>>>>
>>>>>>>>> On the other hand, this would basically be a "roll up". It could still
>>>>>>>>> be valuable to distinguish between single-key-time-range,
>>>>>>>>> key-range-time-range, and all-range queries. Users could still aggregate
>>>>>>>>> those later if they are not interested in the details, while it's not
>>>>>>>>> possible for user to split a pre-aggregated metric into it's component.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Input from others might be helpful here, too.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
>>>>>>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
>>>>>>>>>> realized there are a bunch of issues on metric names v.s. function
>>>>>>>> names,
>>>>>>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
>>>>>>>>> etc.
>>>>>>>>>> So in that KIP I proposed to make the function name aligned with
>>>>>>>> metrics
>>>>>>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
>>>>>>>>>> suggest we make this change as part of KIP-444 as well. Note that it
>>>>>>>>> means
>>>>>>>>>> different functions with the same name `range` will be measured under a
>>>>>>>>>> single metric then.
>>>>>>>>>>
>>>>>>>>>> But still for function named `all` it will be measured under a separate
>>>>>>>>>> metric named `all`, so I'm just clarifying with you if that's the
>>>>>>>>> intention.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
>>>>>>>> not
>>>>>>>>>>> take any parameters to limit a range and is a good name IMHO. But I am
>>>>>>>>>>> not married to keep `all()` and if we think we should rename it, too,
>>>>>>>> I
>>>>>>>>>>> am fine with it.
>>>>>>>>>>>
>>>>>>>>>>> Not sure what connection you make to metrics though. Can you
>>>>>>>> elaborate?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Would be interested to hear others opinions on this, too.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
>>>>>>>>>>>> I like the renaming, since it also aligns with our metrics cleanup
>>>>>>>>>>>> (KIP-444) which touches upon the store level metrics as well.
>>>>>>>>>>>>
>>>>>>>>>>>> One question: you seems still suggesting to keep "all" with the
>>>>>>>> current
>>>>>>>>>>>> name (and also using a separate metric for it), what's the difference
>>>>>>>>>>>> between this one and other "range" functions?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
>>>>>>>> matthias@confluent.io
>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the input.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>>>>>>>> function
>>>>>>>>>>>>>>> and the old functions that we want to deprecate / remove right?
>>>>>>>> The
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
>>>>>>>>> which
>>>>>>>>>>>>> we want to change the return type.
>>>>>>>>>>>>>
>>>>>>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
>>>>>>>>> we
>>>>>>>>>>>>> could either only deprecate those fetch methods that return
>>>>>>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
>>>>>>>>>>>>> consider all methods that are called "fetch":
>>>>>>>>>>>>>
>>>>>>>>>>>>> ReadOnlyWindowStore:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> V fetch(K, long)
>>>>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
>>>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>>>>>>>
>>>>>>>>>>>>> WindowStore:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
>>>>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
>>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
>>>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
>>>>>>>> that
>>>>>>>>>>>>> fetch over all keys.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
>>>>>>>>> rename
>>>>>>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
>>>>>>>> to
>>>>>>>>>>>>> have range()/rangeAll().
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we do this, we might consider to rename methods for SessionStore,
>>>>>>>>>>>>> too. There is
>>>>>>>>>>>>>
>>>>>>>>>>>>>> ReadOnlySessionStore#fetch(K)
>>>>>>>>>>>>>> ReadOnlySessionStore#fetch(K, K)
>>>>>>>>>>>>>> SessionStore#findSessions(K, long, long)
>>>>>>>>>>>>>> SessionStore#findSessions(K, K, long, long)
>>>>>>>>>>>>>> SessionStore#fetchSession(K, long, long);
>>>>>>>>>>>>>
>>>>>>>>>>>>> Consistent renaming might be:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ReadOnlySessionStore#range(K)
>>>>>>>>>>>>> ReadOnlySessionStore#range(K, K)
>>>>>>>>>>>>> SessionStore#range(K, long, long)
>>>>>>>>>>>>> SessionStore#range(K, K, long, long)
>>>>>>>>>>>>> SessionStore#get(K, long, long);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not sure if extensive renaming might have too big of an impact.
>>>>>>>>> However,
>>>>>>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
>>>>>>>>>>>>> thus, it might also provide some additional value.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
>>>>>>>>>>>>>> Hello Matthias,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>>>>>>>> function
>>>>>>>>>>>>>> and the old functions that we want to deprecate / remove right? The
>>>>>>>>>>>>>> existing ones have different signatures with parameters so that
>>>>>>>> they
>>>>>>>>>>>>> should
>>>>>>>>>>>>>> not have conflicts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was thinking about just make the change directly without
>>>>>>>>> deprecating
>>>>>>>>>>>>>> existing ones which would require users of 2.3 to make code changes
>>>>>>>>> --
>>>>>>>>>>>>> this
>>>>>>>>>>>>>> code change looks reasonably straight-forward to me and not much
>>>>>>>>> worth
>>>>>>>>>>>>>> deferring to later when the deprecated ones are removed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
>>>>>>>> new
>>>>>>>>>>>>>> functions seems not very useful for users either since it is only
>>>>>>>>> used
>>>>>>>>>>> as
>>>>>>>>>>>>>> an indicator but users cannot make code changes during this phase
>>>>>>>>>>>>> anyways,
>>>>>>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
>>>>>>>>>>> deprecated
>>>>>>>>>>>>>> ones and add the new one.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
>>>>>>>>> with
>>>>>>>>>>>>> new
>>>>>>>>>>>>>> functions in one release, but if people have a good idea of the
>>>>>>>>>>> renaming
>>>>>>>>>>>>>> approach (I do not have a good one on top of my head :) I can also
>>>>>>>> be
>>>>>>>>>>>>>> convinced that way.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am open to change the return type to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, this requires to rename
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
>>>>>>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> to avoid naming conflicts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What new name would you suggest? The existing methods are called
>>>>>>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> While I think it would be good to get fully aligned return types,
>>>>>>>> I
>>>>>>>>> am
>>>>>>>>>>>>>>> not sure how we can get aligned method names (without renaming all
>>>>>>>>> of
>>>>>>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
>>>>>>>>>>> up, I
>>>>>>>>>>>>>>> am no opposed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
>>>>>>>>>>>>>>>> I was thinking about changing the return type even, to
>>>>>>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
>>>>>>>>>>> about
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
>>>>>>>> clearly
>>>>>>>>>>>>>>> either),
>>>>>>>>>>>>>>>> note it is not backward compatible at all.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
>>>>>>>>> that
>>>>>>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
>>>>>>>>>>> people
>>>>>>>>>>>>>>> felt
>>>>>>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
>>>>>>>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
>>>>>>>>>>>>>>> sophie@confluent.io>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I remember thinking this while working on window stores, am
>>>>>>>>>>> definitely
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
>>>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
>>>>>>>>>>>>>>>>>>> `WindowStoreIterator`.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>


Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by John Roesler <jo...@confluent.io>.
Hi again, all,

After wrestling with some other issues around the window interface,
I'd propose that we consider normalizing the WindowStore (and
SessionStore) interfaces with respect to KeyValueStore. We can't
actually make the classes related because they'll clash on the
deprecated methods, but we can align them. Doing so would be an
ergonomic advantage, since all our provided store interfaces would
have the same "shape".

Specifically, this means that we would deprecate any method that takes
a raw "K key, long windowStartTime" or returns a raw "K key", and
instead always take as arguments Windowed<K> keys and also return
Windowed<K> results. This proposal already comes close to this, by
re-using the KeyValueIterator<Windowed<K, V>>, so it would just be a
few tweaks to "perfect" it.

What do you think?

Thanks,
-John

On Tue, May 28, 2019 at 2:58 PM Matthias J. Sax <ma...@confluent.io> wrote:
>
> Let's see what other think about (2)
>
>
>
> (3) Interesting. My reasoning follows the code:
>
> For example, `RocksDbKeyValueBytesStoreSupplier#metricsScope()`returns
> "rocksdb-state" and this is concatenated later in `MeteredKeyValueStore`
> that adds a tag
>
>   key:       metricScope + "-id"
>   value:     name()                  // this is the store name
>
> Hence, the `-state` part comes from the supplier and thus it seems to be
> part of `[store-type]` (ie, `store-type` == metricScope()` -- not sure
> if this interpretation holds).
>
> If it's not part of `[store-type]` the supplier should not return it as
> `metricScope()` IMHO, but the `MeteredKeyValueStore` should add it
> together with `-id` suffix from my understanding.
>
> Thoughts?
>
>
>
> (5) Renaming to `streams` might imply that we need to rename _all_
> metrics, not just the store metrics that are affected, to avoid
> inconsistencies.
>
> If we really want to rename it, I would rather suggest to do it as part
> of KIP-444, that is a larger rework anyway. It seems to be out of scope
> for this KIP.
>
>
> -Matthias
>
>
> On 5/28/19 12:30 PM, Bruno Cadonna wrote:
> > Hi Matthias,
> >
> > 2)
> > Yes, this is how I meant it.
> > I am still in favour of `value-by-...` instead of `get`, because it
> > gives you immediately the meaning of the metric without the need to
> > know the API of the stores.
> >
> > 3)
> > I asked to avoid misunderstandings because in KIP-444, `-state-` is
> > stated explicitly in the tag.
> >
> > 5)
> > The question was not about correctness. I know that we use `stream` in
> > the metrics group. The question was rather if we want to change it.
> > The streams metrics interface is called `StreamsMetrics`. I understand
> > that this has low priority. Just wanted to mention it.
> >
> > Best,
> > Bruno
> >
> >
> > On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax <ma...@confluent.io> wrote:
> >>
> >> Thanks Bruno.
> >>
> >> I updated the KIP without changing the metric name yet -- want to
> >> discuss this first.
> >>
> >>
> >>
> >> 1) I am also ok to keep `all()`
> >>
> >>
> >> 2) That's a good idea. To make sure I understand your suggestion
> >> correctly, below the mapping between metric name an methods.
> >> (Some metric names are use by multiple stores):
> >>
> >> (ReadOnly)KeyValueStore:
> >>
> >>>> value-by-key : ReadOnlyKeyValueStore#get(K key)
> >>
> >> -> replaces `get` metric
> >>
> >> I am actually not sure if I like this. Just keeping `get` instead of
> >> `value-by-key`, `value-by-key-and-time` seems more reasonable to me.
> >>
> >>
> >>
> >> (ReadOnly)WindowStore:
> >>
> >>>> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long windowStartTime)
> >>
> >> I think a simple `get` might be better
> >>
> >>>> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant from, Instant to)
> >>>> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long toTime)
> >>
> >>>> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K to, Instant from, Instant to)
> >>>> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long fromTime, long toTime)
> >>
> >>>> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to)
> >>>> range-by-time-range : WindowStore#range(long fromTime, long toTime)
> >>
> >>>> range-all : ReadOnlyWindowStore#all()
> >>
> >>
> >> (ReadOnly)SessionStore:
> >>
> >>>> range-by-key : ReadOnlySessionStore#range(K key)
> >>
> >>>> range-by-key-range : ReadOnlySessionStore#range(K from, K to)
> >>>> range-by-key-and-time-range : SessionStore#range(K key, long earliestSessionEndTime, long latestSessionStartTime)
> >>
> >>>> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, long earliestSessionEndTime, long latestSessionStartTime)
> >>
> >>>> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, long sessionEndTime)
> >>
> >> I think a simple `get` might be better
> >>
> >>>> range-all : ReadOnlyKeyValueStore#all()
> >>
> >>
> >>
> >>
> >> 3) the `state-` part is already contained in `[storeType]` do I think
> >> it's correct as-is
> >>
> >>
> >> 4) Ack. Fixed.
> >>
> >>
> >> 5) I think `stream` (plural) is correct. Cf
> >> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 5/28/19 3:26 AM, Bruno Cadonna wrote:
> >>> Hi all,
> >>>
> >>> My comments on this KIP:
> >>>
> >>> 1. I would use `all()` instead of `range()` because the functionality
> >>> is immediately clear without the need to look at the parameter list.
> >>>
> >>> 2. I would decouple method names from metrics name, because this
> >>> allows us to change one naming independently from the other in the
> >>> future. From the previous discussion on this thread, I have also the
> >>> feeling that the naming requirements differ between Java code and
> >>> metrics.
> >>>
> >>> My proposal for the metric names is the following:
> >>>
> >>> value-by-key
> >>> range-by-key
> >>> value-by-key-and-time
> >>> range-by-key-range
> >>> range-by-time-range
> >>> range-by-key-and-time-range
> >>> range-by-key-range-and-time-range
> >>> range-all
> >>>
> >>> I omitted the metric types like latency-avg, etc. The first word
> >>> denotes whether the query is a point query (`value`) or a range query
> >>> (`range`). The words after the `by` denote the parameter types of the
> >>> query. `range-all` is a special case. I hope, I did not miss any.
> >>> We could also prefix the above names with `get-` if you think, it
> >>> would make the names clearer.
> >>>
> >>> 3. Should `[storeType]-id` be `[storeType]-state-id`?
> >>>
> >>> 4. Some method signatures in the KIP under comment `// new` have still
> >>> the `@Deprecated` annotation. Copy&Paste error?
> >>>
> >>> 5. Should `type = stream-[storeType]-metrics` be `type =
> >>> streams-[storeType]-metrics` or do we need to keep `stream` for
> >>> historical/backward-compatibility reasons?
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>> On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <ma...@confluent.io> wrote:
> >>>>
> >>>> Thanks John and Guozhang.
> >>>>
> >>>> I also lean towards different metric names.
> >>>>
> >>>> While updating the KIP, I also realized that the whole store API for
> >>>> window and session store is actually rather inconsistent. Hence, I
> >>>> extended the scope of this KIP to cleanup both interfaces and changed
> >>>> the KIP name to
> >>>>
> >>>>    "Cleanup built-in Store interfaces"
> >>>>
> >>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
> >>>>
> >>>> I also included a small change to KeyValueStore to align all built-in
> >>>> stores holistically.
> >>>>
> >>>>
> >>>>
> >>>> Looking forward to your feedback.
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>> On 5/7/19 1:19 AM, Guozhang Wang wrote:
> >>>>> Thanks John.
> >>>>>
> >>>>> I think I'm convinced about not collapsing the metrics measuring for
> >>>>> various function calls. Regarding the possible solution I'm a bit inclined
> >>>>> to just distinguish on the metric names directly over on tags: since our
> >>>>> current metrics naming are a tad messed up anyways, fixing them in one shot
> >>>>> as a breaking change sounds reasonable to me.
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:
> >>>>>
> >>>>>> Thanks all (or range? ;) ) for the discussion. Good points all around.
> >>>>>>
> >>>>>> Although I see the allure of naming the metrics the same as the things
> >>>>>> they're measuring, it seems not to be perfect. Seconding Matthias's latter
> >>>>>> thought, I think it's likely you'd want to measure the method calls
> >>>>>> independently, since the different range variants would have wildly
> >>>>>> different characteristics, which could then lead you to want to orient the
> >>>>>> storage differently to support particular use cases.
> >>>>>>
> >>>>>> Pointing out some structural characteristics (I know you all know this
> >>>>>> stuff, I'm just constructing a table for analysis):
> >>>>>> * Java supports method-name overloading. *Different* methods can have the
> >>>>>> same names, distinguished by arg lists; it doesn't change the fact that
> >>>>>> they are different methods.
> >>>>>> * Metrics does not support metric-name overloading, but metric names do
> >>>>>> have some structure we could exploit, if you consider the tags.
> >>>>>>
> >>>>>> It seems to me that there's actually more domain mismatch if we just have
> >>>>>> one metric named "range", since (e.g., in the SessionStore proposal above)
> >>>>>> the Java API has *four* methods named "range".
> >>>>>>
> >>>>>> Two potential solutions I see:
> >>>>>> * hierarchical metric names: "range-single-key-all-time",
> >>>>>> "range-key-range-all-time", "range-single-key-time-range",
> >>>>>> "range-key-range-time-range", maybe with better names... I'm not the best
> >>>>>> at this stuff. Hopefully, you see the point, though... they all start with
> >>>>>> "range", which provides the association to the method, and all have a
> >>>>>> suffix which identifies the overload being measured.
> >>>>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
> >>>>>> {"variant": "key-range-all-time"}, "range" {"variant":
> >>>>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
> >>>>>> you could even split the tags up semantically, but my instinct says that
> >>>>>> that would just make it harder to digest the metrics later on.
> >>>>>>
> >>>>>> Just some ideas.
> >>>>>> -John
> >>>>>>
> >>>>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for the input Guozhang. I was not aware of those dependencies.
> >>>>>>>
> >>>>>>> It might be good to align this KIP with the metrics cleanup. Not sure
> >>>>>>> atm, if we should use different metric names for different overloads,
> >>>>>>> even if those have the same method name?
> >>>>>>>
> >>>>>>> If we rename all method to `range()` and use the same metric name for
> >>>>>>> all, one could argue that this is still fine, because the metric
> >>>>>>> collects how often a range query is executed (regardless of the range
> >>>>>>> itself).
> >>>>>>>
> >>>>>>> On the other hand, this would basically be a "roll up". It could still
> >>>>>>> be valuable to distinguish between single-key-time-range,
> >>>>>>> key-range-time-range, and all-range queries. Users could still aggregate
> >>>>>>> those later if they are not interested in the details, while it's not
> >>>>>>> possible for user to split a pre-aggregated metric into it's component.
> >>>>>>>
> >>>>>>>
> >>>>>>> Input from others might be helpful here, too.
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
> >>>>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
> >>>>>>>> realized there are a bunch of issues on metric names v.s. function
> >>>>>> names,
> >>>>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
> >>>>>>> etc.
> >>>>>>>> So in that KIP I proposed to make the function name aligned with
> >>>>>> metrics
> >>>>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
> >>>>>>>> suggest we make this change as part of KIP-444 as well. Note that it
> >>>>>>> means
> >>>>>>>> different functions with the same name `range` will be measured under a
> >>>>>>>> single metric then.
> >>>>>>>>
> >>>>>>>> But still for function named `all` it will be measured under a separate
> >>>>>>>> metric named `all`, so I'm just clarifying with you if that's the
> >>>>>>> intention.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
> >>>>>> not
> >>>>>>>>> take any parameters to limit a range and is a good name IMHO. But I am
> >>>>>>>>> not married to keep `all()` and if we think we should rename it, too,
> >>>>>> I
> >>>>>>>>> am fine with it.
> >>>>>>>>>
> >>>>>>>>> Not sure what connection you make to metrics though. Can you
> >>>>>> elaborate?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Would be interested to hear others opinions on this, too.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
> >>>>>>>>>> I like the renaming, since it also aligns with our metrics cleanup
> >>>>>>>>>> (KIP-444) which touches upon the store level metrics as well.
> >>>>>>>>>>
> >>>>>>>>>> One question: you seems still suggesting to keep "all" with the
> >>>>>> current
> >>>>>>>>>> name (and also using a separate metric for it), what's the difference
> >>>>>>>>>> between this one and other "range" functions?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
> >>>>>> matthias@confluent.io
> >>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks for the input.
> >>>>>>>>>>>
> >>>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>>>>>> function
> >>>>>>>>>>>>> and the old functions that we want to deprecate / remove right?
> >>>>>> The
> >>>>>>>>>>>
> >>>>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
> >>>>>>> which
> >>>>>>>>>>> we want to change the return type.
> >>>>>>>>>>>
> >>>>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
> >>>>>>> we
> >>>>>>>>>>> could either only deprecate those fetch methods that return
> >>>>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
> >>>>>>>>>>>
> >>>>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
> >>>>>>>>>>> consider all methods that are called "fetch":
> >>>>>>>>>>>
> >>>>>>>>>>> ReadOnlyWindowStore:
> >>>>>>>>>>>
> >>>>>>>>>>>> V fetch(K, long)
> >>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
> >>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>>>>>
> >>>>>>>>>>> WindowStore:
> >>>>>>>>>>>
> >>>>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
> >>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
> >>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
> >>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>>>>>
> >>>>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
> >>>>>> that
> >>>>>>>>>>> fetch over all keys.
> >>>>>>>>>>>
> >>>>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
> >>>>>>> rename
> >>>>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
> >>>>>> to
> >>>>>>>>>>> have range()/rangeAll().
> >>>>>>>>>>>
> >>>>>>>>>>> If we do this, we might consider to rename methods for SessionStore,
> >>>>>>>>>>> too. There is
> >>>>>>>>>>>
> >>>>>>>>>>>> ReadOnlySessionStore#fetch(K)
> >>>>>>>>>>>> ReadOnlySessionStore#fetch(K, K)
> >>>>>>>>>>>> SessionStore#findSessions(K, long, long)
> >>>>>>>>>>>> SessionStore#findSessions(K, K, long, long)
> >>>>>>>>>>>> SessionStore#fetchSession(K, long, long);
> >>>>>>>>>>>
> >>>>>>>>>>> Consistent renaming might be:
> >>>>>>>>>>>
> >>>>>>>>>>> ReadOnlySessionStore#range(K)
> >>>>>>>>>>> ReadOnlySessionStore#range(K, K)
> >>>>>>>>>>> SessionStore#range(K, long, long)
> >>>>>>>>>>> SessionStore#range(K, K, long, long)
> >>>>>>>>>>> SessionStore#get(K, long, long);
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Not sure if extensive renaming might have too big of an impact.
> >>>>>>> However,
> >>>>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
> >>>>>>>>>>> thus, it might also provide some additional value.
> >>>>>>>>>>>
> >>>>>>>>>>> Thoughts?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
> >>>>>>>>>>>> Hello Matthias,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>>>>>> function
> >>>>>>>>>>>> and the old functions that we want to deprecate / remove right? The
> >>>>>>>>>>>> existing ones have different signatures with parameters so that
> >>>>>> they
> >>>>>>>>>>> should
> >>>>>>>>>>>> not have conflicts.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I was thinking about just make the change directly without
> >>>>>>> deprecating
> >>>>>>>>>>>> existing ones which would require users of 2.3 to make code changes
> >>>>>>> --
> >>>>>>>>>>> this
> >>>>>>>>>>>> code change looks reasonably straight-forward to me and not much
> >>>>>>> worth
> >>>>>>>>>>>> deferring to later when the deprecated ones are removed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
> >>>>>> new
> >>>>>>>>>>>> functions seems not very useful for users either since it is only
> >>>>>>> used
> >>>>>>>>> as
> >>>>>>>>>>>> an indicator but users cannot make code changes during this phase
> >>>>>>>>>>> anyways,
> >>>>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
> >>>>>>>>> deprecated
> >>>>>>>>>>>> ones and add the new one.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
> >>>>>>> with
> >>>>>>>>>>> new
> >>>>>>>>>>>> functions in one release, but if people have a good idea of the
> >>>>>>>>> renaming
> >>>>>>>>>>>> approach (I do not have a good one on top of my head :) I can also
> >>>>>> be
> >>>>>>>>>>>> convinced that way.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
> >>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I am open to change the return type to
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> However, this requires to rename
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
> >>>>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> to avoid naming conflicts.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> What new name would you suggest? The existing methods are called
> >>>>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> While I think it would be good to get fully aligned return types,
> >>>>>> I
> >>>>>>> am
> >>>>>>>>>>>>> not sure how we can get aligned method names (without renaming all
> >>>>>>> of
> >>>>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
> >>>>>>>>> up, I
> >>>>>>>>>>>>> am no opposed.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thoughts?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
> >>>>>>>>>>>>>> I was thinking about changing the return type even, to
> >>>>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
> >>>>>>>>> about
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
> >>>>>> clearly
> >>>>>>>>>>>>> either),
> >>>>>>>>>>>>>> note it is not backward compatible at all.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
> >>>>>>> that
> >>>>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
> >>>>>>>>> people
> >>>>>>>>>>>>> felt
> >>>>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
> >>>>>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
> >>>>>>>>>>>>> sophie@confluent.io>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I remember thinking this while working on window stores, am
> >>>>>>>>> definitely
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>>> it.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
> >>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
> >>>>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
> >>>>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
> >>>>>>>>>>>>>>>>> `WindowStoreIterator`.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>
>

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Let's see what other think about (2)



(3) Interesting. My reasoning follows the code:

For example, `RocksDbKeyValueBytesStoreSupplier#metricsScope()`returns
"rocksdb-state" and this is concatenated later in `MeteredKeyValueStore`
that adds a tag

  key:       metricScope + "-id"
  value:     name()                  // this is the store name

Hence, the `-state` part comes from the supplier and thus it seems to be
part of `[store-type]` (ie, `store-type` == metricScope()` -- not sure
if this interpretation holds).

If it's not part of `[store-type]` the supplier should not return it as
`metricScope()` IMHO, but the `MeteredKeyValueStore` should add it
together with `-id` suffix from my understanding.

Thoughts?



(5) Renaming to `streams` might imply that we need to rename _all_
metrics, not just the store metrics that are affected, to avoid
inconsistencies.

If we really want to rename it, I would rather suggest to do it as part
of KIP-444, that is a larger rework anyway. It seems to be out of scope
for this KIP.


-Matthias


On 5/28/19 12:30 PM, Bruno Cadonna wrote:
> Hi Matthias,
> 
> 2)
> Yes, this is how I meant it.
> I am still in favour of `value-by-...` instead of `get`, because it
> gives you immediately the meaning of the metric without the need to
> know the API of the stores.
> 
> 3)
> I asked to avoid misunderstandings because in KIP-444, `-state-` is
> stated explicitly in the tag.
> 
> 5)
> The question was not about correctness. I know that we use `stream` in
> the metrics group. The question was rather if we want to change it.
> The streams metrics interface is called `StreamsMetrics`. I understand
> that this has low priority. Just wanted to mention it.
> 
> Best,
> Bruno
> 
> 
> On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax <ma...@confluent.io> wrote:
>>
>> Thanks Bruno.
>>
>> I updated the KIP without changing the metric name yet -- want to
>> discuss this first.
>>
>>
>>
>> 1) I am also ok to keep `all()`
>>
>>
>> 2) That's a good idea. To make sure I understand your suggestion
>> correctly, below the mapping between metric name an methods.
>> (Some metric names are use by multiple stores):
>>
>> (ReadOnly)KeyValueStore:
>>
>>>> value-by-key : ReadOnlyKeyValueStore#get(K key)
>>
>> -> replaces `get` metric
>>
>> I am actually not sure if I like this. Just keeping `get` instead of
>> `value-by-key`, `value-by-key-and-time` seems more reasonable to me.
>>
>>
>>
>> (ReadOnly)WindowStore:
>>
>>>> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long windowStartTime)
>>
>> I think a simple `get` might be better
>>
>>>> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant from, Instant to)
>>>> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long toTime)
>>
>>>> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K to, Instant from, Instant to)
>>>> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long fromTime, long toTime)
>>
>>>> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to)
>>>> range-by-time-range : WindowStore#range(long fromTime, long toTime)
>>
>>>> range-all : ReadOnlyWindowStore#all()
>>
>>
>> (ReadOnly)SessionStore:
>>
>>>> range-by-key : ReadOnlySessionStore#range(K key)
>>
>>>> range-by-key-range : ReadOnlySessionStore#range(K from, K to)
>>>> range-by-key-and-time-range : SessionStore#range(K key, long earliestSessionEndTime, long latestSessionStartTime)
>>
>>>> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, long earliestSessionEndTime, long latestSessionStartTime)
>>
>>>> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, long sessionEndTime)
>>
>> I think a simple `get` might be better
>>
>>>> range-all : ReadOnlyKeyValueStore#all()
>>
>>
>>
>>
>> 3) the `state-` part is already contained in `[storeType]` do I think
>> it's correct as-is
>>
>>
>> 4) Ack. Fixed.
>>
>>
>> 5) I think `stream` (plural) is correct. Cf
>> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87
>>
>>
>>
>> -Matthias
>>
>>
>> On 5/28/19 3:26 AM, Bruno Cadonna wrote:
>>> Hi all,
>>>
>>> My comments on this KIP:
>>>
>>> 1. I would use `all()` instead of `range()` because the functionality
>>> is immediately clear without the need to look at the parameter list.
>>>
>>> 2. I would decouple method names from metrics name, because this
>>> allows us to change one naming independently from the other in the
>>> future. From the previous discussion on this thread, I have also the
>>> feeling that the naming requirements differ between Java code and
>>> metrics.
>>>
>>> My proposal for the metric names is the following:
>>>
>>> value-by-key
>>> range-by-key
>>> value-by-key-and-time
>>> range-by-key-range
>>> range-by-time-range
>>> range-by-key-and-time-range
>>> range-by-key-range-and-time-range
>>> range-all
>>>
>>> I omitted the metric types like latency-avg, etc. The first word
>>> denotes whether the query is a point query (`value`) or a range query
>>> (`range`). The words after the `by` denote the parameter types of the
>>> query. `range-all` is a special case. I hope, I did not miss any.
>>> We could also prefix the above names with `get-` if you think, it
>>> would make the names clearer.
>>>
>>> 3. Should `[storeType]-id` be `[storeType]-state-id`?
>>>
>>> 4. Some method signatures in the KIP under comment `// new` have still
>>> the `@Deprecated` annotation. Copy&Paste error?
>>>
>>> 5. Should `type = stream-[storeType]-metrics` be `type =
>>> streams-[storeType]-metrics` or do we need to keep `stream` for
>>> historical/backward-compatibility reasons?
>>>
>>> Best,
>>> Bruno
>>>
>>> On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <ma...@confluent.io> wrote:
>>>>
>>>> Thanks John and Guozhang.
>>>>
>>>> I also lean towards different metric names.
>>>>
>>>> While updating the KIP, I also realized that the whole store API for
>>>> window and session store is actually rather inconsistent. Hence, I
>>>> extended the scope of this KIP to cleanup both interfaces and changed
>>>> the KIP name to
>>>>
>>>>    "Cleanup built-in Store interfaces"
>>>>
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
>>>>
>>>> I also included a small change to KeyValueStore to align all built-in
>>>> stores holistically.
>>>>
>>>>
>>>>
>>>> Looking forward to your feedback.
>>>>
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>>
>>>> On 5/7/19 1:19 AM, Guozhang Wang wrote:
>>>>> Thanks John.
>>>>>
>>>>> I think I'm convinced about not collapsing the metrics measuring for
>>>>> various function calls. Regarding the possible solution I'm a bit inclined
>>>>> to just distinguish on the metric names directly over on tags: since our
>>>>> current metrics naming are a tad messed up anyways, fixing them in one shot
>>>>> as a breaking change sounds reasonable to me.
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:
>>>>>
>>>>>> Thanks all (or range? ;) ) for the discussion. Good points all around.
>>>>>>
>>>>>> Although I see the allure of naming the metrics the same as the things
>>>>>> they're measuring, it seems not to be perfect. Seconding Matthias's latter
>>>>>> thought, I think it's likely you'd want to measure the method calls
>>>>>> independently, since the different range variants would have wildly
>>>>>> different characteristics, which could then lead you to want to orient the
>>>>>> storage differently to support particular use cases.
>>>>>>
>>>>>> Pointing out some structural characteristics (I know you all know this
>>>>>> stuff, I'm just constructing a table for analysis):
>>>>>> * Java supports method-name overloading. *Different* methods can have the
>>>>>> same names, distinguished by arg lists; it doesn't change the fact that
>>>>>> they are different methods.
>>>>>> * Metrics does not support metric-name overloading, but metric names do
>>>>>> have some structure we could exploit, if you consider the tags.
>>>>>>
>>>>>> It seems to me that there's actually more domain mismatch if we just have
>>>>>> one metric named "range", since (e.g., in the SessionStore proposal above)
>>>>>> the Java API has *four* methods named "range".
>>>>>>
>>>>>> Two potential solutions I see:
>>>>>> * hierarchical metric names: "range-single-key-all-time",
>>>>>> "range-key-range-all-time", "range-single-key-time-range",
>>>>>> "range-key-range-time-range", maybe with better names... I'm not the best
>>>>>> at this stuff. Hopefully, you see the point, though... they all start with
>>>>>> "range", which provides the association to the method, and all have a
>>>>>> suffix which identifies the overload being measured.
>>>>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
>>>>>> {"variant": "key-range-all-time"}, "range" {"variant":
>>>>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
>>>>>> you could even split the tags up semantically, but my instinct says that
>>>>>> that would just make it harder to digest the metrics later on.
>>>>>>
>>>>>> Just some ideas.
>>>>>> -John
>>>>>>
>>>>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the input Guozhang. I was not aware of those dependencies.
>>>>>>>
>>>>>>> It might be good to align this KIP with the metrics cleanup. Not sure
>>>>>>> atm, if we should use different metric names for different overloads,
>>>>>>> even if those have the same method name?
>>>>>>>
>>>>>>> If we rename all method to `range()` and use the same metric name for
>>>>>>> all, one could argue that this is still fine, because the metric
>>>>>>> collects how often a range query is executed (regardless of the range
>>>>>>> itself).
>>>>>>>
>>>>>>> On the other hand, this would basically be a "roll up". It could still
>>>>>>> be valuable to distinguish between single-key-time-range,
>>>>>>> key-range-time-range, and all-range queries. Users could still aggregate
>>>>>>> those later if they are not interested in the details, while it's not
>>>>>>> possible for user to split a pre-aggregated metric into it's component.
>>>>>>>
>>>>>>>
>>>>>>> Input from others might be helpful here, too.
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
>>>>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
>>>>>>>> realized there are a bunch of issues on metric names v.s. function
>>>>>> names,
>>>>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
>>>>>>> etc.
>>>>>>>> So in that KIP I proposed to make the function name aligned with
>>>>>> metrics
>>>>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
>>>>>>>> suggest we make this change as part of KIP-444 as well. Note that it
>>>>>>> means
>>>>>>>> different functions with the same name `range` will be measured under a
>>>>>>>> single metric then.
>>>>>>>>
>>>>>>>> But still for function named `all` it will be measured under a separate
>>>>>>>> metric named `all`, so I'm just clarifying with you if that's the
>>>>>>> intention.
>>>>>>>>
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
>>>>>> not
>>>>>>>>> take any parameters to limit a range and is a good name IMHO. But I am
>>>>>>>>> not married to keep `all()` and if we think we should rename it, too,
>>>>>> I
>>>>>>>>> am fine with it.
>>>>>>>>>
>>>>>>>>> Not sure what connection you make to metrics though. Can you
>>>>>> elaborate?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Would be interested to hear others opinions on this, too.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
>>>>>>>>>> I like the renaming, since it also aligns with our metrics cleanup
>>>>>>>>>> (KIP-444) which touches upon the store level metrics as well.
>>>>>>>>>>
>>>>>>>>>> One question: you seems still suggesting to keep "all" with the
>>>>>> current
>>>>>>>>>> name (and also using a separate metric for it), what's the difference
>>>>>>>>>> between this one and other "range" functions?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
>>>>>> matthias@confluent.io
>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the input.
>>>>>>>>>>>
>>>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>>>>>> function
>>>>>>>>>>>>> and the old functions that we want to deprecate / remove right?
>>>>>> The
>>>>>>>>>>>
>>>>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
>>>>>>> which
>>>>>>>>>>> we want to change the return type.
>>>>>>>>>>>
>>>>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
>>>>>>> we
>>>>>>>>>>> could either only deprecate those fetch methods that return
>>>>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
>>>>>>>>>>>
>>>>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
>>>>>>>>>>> consider all methods that are called "fetch":
>>>>>>>>>>>
>>>>>>>>>>> ReadOnlyWindowStore:
>>>>>>>>>>>
>>>>>>>>>>>> V fetch(K, long)
>>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>>>>>
>>>>>>>>>>> WindowStore:
>>>>>>>>>>>
>>>>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
>>>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>>>>>
>>>>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
>>>>>> that
>>>>>>>>>>> fetch over all keys.
>>>>>>>>>>>
>>>>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
>>>>>>> rename
>>>>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
>>>>>> to
>>>>>>>>>>> have range()/rangeAll().
>>>>>>>>>>>
>>>>>>>>>>> If we do this, we might consider to rename methods for SessionStore,
>>>>>>>>>>> too. There is
>>>>>>>>>>>
>>>>>>>>>>>> ReadOnlySessionStore#fetch(K)
>>>>>>>>>>>> ReadOnlySessionStore#fetch(K, K)
>>>>>>>>>>>> SessionStore#findSessions(K, long, long)
>>>>>>>>>>>> SessionStore#findSessions(K, K, long, long)
>>>>>>>>>>>> SessionStore#fetchSession(K, long, long);
>>>>>>>>>>>
>>>>>>>>>>> Consistent renaming might be:
>>>>>>>>>>>
>>>>>>>>>>> ReadOnlySessionStore#range(K)
>>>>>>>>>>> ReadOnlySessionStore#range(K, K)
>>>>>>>>>>> SessionStore#range(K, long, long)
>>>>>>>>>>> SessionStore#range(K, K, long, long)
>>>>>>>>>>> SessionStore#get(K, long, long);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Not sure if extensive renaming might have too big of an impact.
>>>>>>> However,
>>>>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
>>>>>>>>>>> thus, it might also provide some additional value.
>>>>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
>>>>>>>>>>>> Hello Matthias,
>>>>>>>>>>>>
>>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>>>>>> function
>>>>>>>>>>>> and the old functions that we want to deprecate / remove right? The
>>>>>>>>>>>> existing ones have different signatures with parameters so that
>>>>>> they
>>>>>>>>>>> should
>>>>>>>>>>>> not have conflicts.
>>>>>>>>>>>>
>>>>>>>>>>>> I was thinking about just make the change directly without
>>>>>>> deprecating
>>>>>>>>>>>> existing ones which would require users of 2.3 to make code changes
>>>>>>> --
>>>>>>>>>>> this
>>>>>>>>>>>> code change looks reasonably straight-forward to me and not much
>>>>>>> worth
>>>>>>>>>>>> deferring to later when the deprecated ones are removed.
>>>>>>>>>>>>
>>>>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
>>>>>> new
>>>>>>>>>>>> functions seems not very useful for users either since it is only
>>>>>>> used
>>>>>>>>> as
>>>>>>>>>>>> an indicator but users cannot make code changes during this phase
>>>>>>>>>>> anyways,
>>>>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
>>>>>>>>> deprecated
>>>>>>>>>>>> ones and add the new one.
>>>>>>>>>>>>
>>>>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
>>>>>>> with
>>>>>>>>>>> new
>>>>>>>>>>>> functions in one release, but if people have a good idea of the
>>>>>>>>> renaming
>>>>>>>>>>>> approach (I do not have a good one on top of my head :) I can also
>>>>>> be
>>>>>>>>>>>> convinced that way.
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am open to change the return type to
>>>>>>>>>>>>>
>>>>>>>>>>>>> KeyValueIterator<Windowed<K>, V>
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, this requires to rename
>>>>>>>>>>>>>
>>>>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
>>>>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
>>>>>>>>>>>>>
>>>>>>>>>>>>> to avoid naming conflicts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What new name would you suggest? The existing methods are called
>>>>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
>>>>>>>>>>>>>
>>>>>>>>>>>>> While I think it would be good to get fully aligned return types,
>>>>>> I
>>>>>>> am
>>>>>>>>>>>>> not sure how we can get aligned method names (without renaming all
>>>>>>> of
>>>>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
>>>>>>>>> up, I
>>>>>>>>>>>>> am no opposed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
>>>>>>>>>>>>>> I was thinking about changing the return type even, to
>>>>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
>>>>>>>>> about
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
>>>>>> clearly
>>>>>>>>>>>>> either),
>>>>>>>>>>>>>> note it is not backward compatible at all.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
>>>>>>> that
>>>>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
>>>>>>>>> people
>>>>>>>>>>>>> felt
>>>>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
>>>>>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
>>>>>>>>>>>>> sophie@confluent.io>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I remember thinking this while working on window stores, am
>>>>>>>>> definitely
>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
>>>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
>>>>>>>>>>>>>>>>> `WindowStoreIterator`.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>


Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by Bruno Cadonna <br...@confluent.io>.
Hi Matthias,

2)
Yes, this is how I meant it.
I am still in favour of `value-by-...` instead of `get`, because it
gives you immediately the meaning of the metric without the need to
know the API of the stores.

3)
I asked to avoid misunderstandings because in KIP-444, `-state-` is
stated explicitly in the tag.

5)
The question was not about correctness. I know that we use `stream` in
the metrics group. The question was rather if we want to change it.
The streams metrics interface is called `StreamsMetrics`. I understand
that this has low priority. Just wanted to mention it.

Best,
Bruno


On Tue, May 28, 2019 at 9:08 PM Matthias J. Sax <ma...@confluent.io> wrote:
>
> Thanks Bruno.
>
> I updated the KIP without changing the metric name yet -- want to
> discuss this first.
>
>
>
> 1) I am also ok to keep `all()`
>
>
> 2) That's a good idea. To make sure I understand your suggestion
> correctly, below the mapping between metric name an methods.
> (Some metric names are use by multiple stores):
>
> (ReadOnly)KeyValueStore:
>
> >> value-by-key : ReadOnlyKeyValueStore#get(K key)
>
> -> replaces `get` metric
>
> I am actually not sure if I like this. Just keeping `get` instead of
> `value-by-key`, `value-by-key-and-time` seems more reasonable to me.
>
>
>
> (ReadOnly)WindowStore:
>
> >> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long windowStartTime)
>
> I think a simple `get` might be better
>
> >> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant from, Instant to)
> >> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long toTime)
>
> >> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K to, Instant from, Instant to)
> >> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long fromTime, long toTime)
>
> >> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to)
> >> range-by-time-range : WindowStore#range(long fromTime, long toTime)
>
> >> range-all : ReadOnlyWindowStore#all()
>
>
> (ReadOnly)SessionStore:
>
> >> range-by-key : ReadOnlySessionStore#range(K key)
>
> >> range-by-key-range : ReadOnlySessionStore#range(K from, K to)
> >> range-by-key-and-time-range : SessionStore#range(K key, long earliestSessionEndTime, long latestSessionStartTime)
>
> >> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, long earliestSessionEndTime, long latestSessionStartTime)
>
> >> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, long sessionEndTime)
>
> I think a simple `get` might be better
>
> >> range-all : ReadOnlyKeyValueStore#all()
>
>
>
>
> 3) the `state-` part is already contained in `[storeType]` do I think
> it's correct as-is
>
>
> 4) Ack. Fixed.
>
>
> 5) I think `stream` (plural) is correct. Cf
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87
>
>
>
> -Matthias
>
>
> On 5/28/19 3:26 AM, Bruno Cadonna wrote:
> > Hi all,
> >
> > My comments on this KIP:
> >
> > 1. I would use `all()` instead of `range()` because the functionality
> > is immediately clear without the need to look at the parameter list.
> >
> > 2. I would decouple method names from metrics name, because this
> > allows us to change one naming independently from the other in the
> > future. From the previous discussion on this thread, I have also the
> > feeling that the naming requirements differ between Java code and
> > metrics.
> >
> > My proposal for the metric names is the following:
> >
> > value-by-key
> > range-by-key
> > value-by-key-and-time
> > range-by-key-range
> > range-by-time-range
> > range-by-key-and-time-range
> > range-by-key-range-and-time-range
> > range-all
> >
> > I omitted the metric types like latency-avg, etc. The first word
> > denotes whether the query is a point query (`value`) or a range query
> > (`range`). The words after the `by` denote the parameter types of the
> > query. `range-all` is a special case. I hope, I did not miss any.
> > We could also prefix the above names with `get-` if you think, it
> > would make the names clearer.
> >
> > 3. Should `[storeType]-id` be `[storeType]-state-id`?
> >
> > 4. Some method signatures in the KIP under comment `// new` have still
> > the `@Deprecated` annotation. Copy&Paste error?
> >
> > 5. Should `type = stream-[storeType]-metrics` be `type =
> > streams-[storeType]-metrics` or do we need to keep `stream` for
> > historical/backward-compatibility reasons?
> >
> > Best,
> > Bruno
> >
> > On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <ma...@confluent.io> wrote:
> >>
> >> Thanks John and Guozhang.
> >>
> >> I also lean towards different metric names.
> >>
> >> While updating the KIP, I also realized that the whole store API for
> >> window and session store is actually rather inconsistent. Hence, I
> >> extended the scope of this KIP to cleanup both interfaces and changed
> >> the KIP name to
> >>
> >>    "Cleanup built-in Store interfaces"
> >>
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
> >>
> >> I also included a small change to KeyValueStore to align all built-in
> >> stores holistically.
> >>
> >>
> >>
> >> Looking forward to your feedback.
> >>
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 5/7/19 1:19 AM, Guozhang Wang wrote:
> >>> Thanks John.
> >>>
> >>> I think I'm convinced about not collapsing the metrics measuring for
> >>> various function calls. Regarding the possible solution I'm a bit inclined
> >>> to just distinguish on the metric names directly over on tags: since our
> >>> current metrics naming are a tad messed up anyways, fixing them in one shot
> >>> as a breaking change sounds reasonable to me.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:
> >>>
> >>>> Thanks all (or range? ;) ) for the discussion. Good points all around.
> >>>>
> >>>> Although I see the allure of naming the metrics the same as the things
> >>>> they're measuring, it seems not to be perfect. Seconding Matthias's latter
> >>>> thought, I think it's likely you'd want to measure the method calls
> >>>> independently, since the different range variants would have wildly
> >>>> different characteristics, which could then lead you to want to orient the
> >>>> storage differently to support particular use cases.
> >>>>
> >>>> Pointing out some structural characteristics (I know you all know this
> >>>> stuff, I'm just constructing a table for analysis):
> >>>> * Java supports method-name overloading. *Different* methods can have the
> >>>> same names, distinguished by arg lists; it doesn't change the fact that
> >>>> they are different methods.
> >>>> * Metrics does not support metric-name overloading, but metric names do
> >>>> have some structure we could exploit, if you consider the tags.
> >>>>
> >>>> It seems to me that there's actually more domain mismatch if we just have
> >>>> one metric named "range", since (e.g., in the SessionStore proposal above)
> >>>> the Java API has *four* methods named "range".
> >>>>
> >>>> Two potential solutions I see:
> >>>> * hierarchical metric names: "range-single-key-all-time",
> >>>> "range-key-range-all-time", "range-single-key-time-range",
> >>>> "range-key-range-time-range", maybe with better names... I'm not the best
> >>>> at this stuff. Hopefully, you see the point, though... they all start with
> >>>> "range", which provides the association to the method, and all have a
> >>>> suffix which identifies the overload being measured.
> >>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
> >>>> {"variant": "key-range-all-time"}, "range" {"variant":
> >>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
> >>>> you could even split the tags up semantically, but my instinct says that
> >>>> that would just make it harder to digest the metrics later on.
> >>>>
> >>>> Just some ideas.
> >>>> -John
> >>>>
> >>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
> >>>> wrote:
> >>>>
> >>>>> Thanks for the input Guozhang. I was not aware of those dependencies.
> >>>>>
> >>>>> It might be good to align this KIP with the metrics cleanup. Not sure
> >>>>> atm, if we should use different metric names for different overloads,
> >>>>> even if those have the same method name?
> >>>>>
> >>>>> If we rename all method to `range()` and use the same metric name for
> >>>>> all, one could argue that this is still fine, because the metric
> >>>>> collects how often a range query is executed (regardless of the range
> >>>>> itself).
> >>>>>
> >>>>> On the other hand, this would basically be a "roll up". It could still
> >>>>> be valuable to distinguish between single-key-time-range,
> >>>>> key-range-time-range, and all-range queries. Users could still aggregate
> >>>>> those later if they are not interested in the details, while it's not
> >>>>> possible for user to split a pre-aggregated metric into it's component.
> >>>>>
> >>>>>
> >>>>> Input from others might be helpful here, too.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
> >>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
> >>>>>> realized there are a bunch of issues on metric names v.s. function
> >>>> names,
> >>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
> >>>>> etc.
> >>>>>> So in that KIP I proposed to make the function name aligned with
> >>>> metrics
> >>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
> >>>>>> suggest we make this change as part of KIP-444 as well. Note that it
> >>>>> means
> >>>>>> different functions with the same name `range` will be measured under a
> >>>>>> single metric then.
> >>>>>>
> >>>>>> But still for function named `all` it will be measured under a separate
> >>>>>> metric named `all`, so I'm just clarifying with you if that's the
> >>>>> intention.
> >>>>>>
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
> >>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
> >>>> not
> >>>>>>> take any parameters to limit a range and is a good name IMHO. But I am
> >>>>>>> not married to keep `all()` and if we think we should rename it, too,
> >>>> I
> >>>>>>> am fine with it.
> >>>>>>>
> >>>>>>> Not sure what connection you make to metrics though. Can you
> >>>> elaborate?
> >>>>>>>
> >>>>>>>
> >>>>>>> Would be interested to hear others opinions on this, too.
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
> >>>>>>>> I like the renaming, since it also aligns with our metrics cleanup
> >>>>>>>> (KIP-444) which touches upon the store level metrics as well.
> >>>>>>>>
> >>>>>>>> One question: you seems still suggesting to keep "all" with the
> >>>> current
> >>>>>>>> name (and also using a separate metric for it), what's the difference
> >>>>>>>> between this one and other "range" functions?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
> >>>> matthias@confluent.io
> >>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Thanks for the input.
> >>>>>>>>>
> >>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>>>> function
> >>>>>>>>>>> and the old functions that we want to deprecate / remove right?
> >>>> The
> >>>>>>>>>
> >>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
> >>>>> which
> >>>>>>>>> we want to change the return type.
> >>>>>>>>>
> >>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
> >>>>> we
> >>>>>>>>> could either only deprecate those fetch methods that return
> >>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
> >>>>>>>>>
> >>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
> >>>>>>>>> consider all methods that are called "fetch":
> >>>>>>>>>
> >>>>>>>>> ReadOnlyWindowStore:
> >>>>>>>>>
> >>>>>>>>>> V fetch(K, long)
> >>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
> >>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>>>
> >>>>>>>>> WindowStore:
> >>>>>>>>>
> >>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
> >>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
> >>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
> >>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>>>
> >>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
> >>>> that
> >>>>>>>>> fetch over all keys.
> >>>>>>>>>
> >>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
> >>>>> rename
> >>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
> >>>> to
> >>>>>>>>> have range()/rangeAll().
> >>>>>>>>>
> >>>>>>>>> If we do this, we might consider to rename methods for SessionStore,
> >>>>>>>>> too. There is
> >>>>>>>>>
> >>>>>>>>>> ReadOnlySessionStore#fetch(K)
> >>>>>>>>>> ReadOnlySessionStore#fetch(K, K)
> >>>>>>>>>> SessionStore#findSessions(K, long, long)
> >>>>>>>>>> SessionStore#findSessions(K, K, long, long)
> >>>>>>>>>> SessionStore#fetchSession(K, long, long);
> >>>>>>>>>
> >>>>>>>>> Consistent renaming might be:
> >>>>>>>>>
> >>>>>>>>> ReadOnlySessionStore#range(K)
> >>>>>>>>> ReadOnlySessionStore#range(K, K)
> >>>>>>>>> SessionStore#range(K, long, long)
> >>>>>>>>> SessionStore#range(K, K, long, long)
> >>>>>>>>> SessionStore#get(K, long, long);
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Not sure if extensive renaming might have too big of an impact.
> >>>>> However,
> >>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
> >>>>>>>>> thus, it might also provide some additional value.
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
> >>>>>>>>>> Hello Matthias,
> >>>>>>>>>>
> >>>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>>>> function
> >>>>>>>>>> and the old functions that we want to deprecate / remove right? The
> >>>>>>>>>> existing ones have different signatures with parameters so that
> >>>> they
> >>>>>>>>> should
> >>>>>>>>>> not have conflicts.
> >>>>>>>>>>
> >>>>>>>>>> I was thinking about just make the change directly without
> >>>>> deprecating
> >>>>>>>>>> existing ones which would require users of 2.3 to make code changes
> >>>>> --
> >>>>>>>>> this
> >>>>>>>>>> code change looks reasonably straight-forward to me and not much
> >>>>> worth
> >>>>>>>>>> deferring to later when the deprecated ones are removed.
> >>>>>>>>>>
> >>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
> >>>> new
> >>>>>>>>>> functions seems not very useful for users either since it is only
> >>>>> used
> >>>>>>> as
> >>>>>>>>>> an indicator but users cannot make code changes during this phase
> >>>>>>>>> anyways,
> >>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
> >>>>>>> deprecated
> >>>>>>>>>> ones and add the new one.
> >>>>>>>>>>
> >>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
> >>>>> with
> >>>>>>>>> new
> >>>>>>>>>> functions in one release, but if people have a good idea of the
> >>>>>>> renaming
> >>>>>>>>>> approach (I do not have a good one on top of my head :) I can also
> >>>> be
> >>>>>>>>>> convinced that way.
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
> >>>>>>> matthias@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I am open to change the return type to
> >>>>>>>>>>>
> >>>>>>>>>>> KeyValueIterator<Windowed<K>, V>
> >>>>>>>>>>>
> >>>>>>>>>>> However, this requires to rename
> >>>>>>>>>>>
> >>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
> >>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
> >>>>>>>>>>>
> >>>>>>>>>>> to avoid naming conflicts.
> >>>>>>>>>>>
> >>>>>>>>>>> What new name would you suggest? The existing methods are called
> >>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
> >>>>>>>>>>>
> >>>>>>>>>>> While I think it would be good to get fully aligned return types,
> >>>> I
> >>>>> am
> >>>>>>>>>>> not sure how we can get aligned method names (without renaming all
> >>>>> of
> >>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
> >>>>>>> up, I
> >>>>>>>>>>> am no opposed.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Thoughts?
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
> >>>>>>>>>>>> I was thinking about changing the return type even, to
> >>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
> >>>>>>> about
> >>>>>>>>>>> the
> >>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
> >>>> clearly
> >>>>>>>>>>> either),
> >>>>>>>>>>>> note it is not backward compatible at all.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
> >>>>> that
> >>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
> >>>>>>> people
> >>>>>>>>>>> felt
> >>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
> >>>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Guozhang
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
> >>>>>>>>>>> sophie@confluent.io>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I remember thinking this while working on window stores, am
> >>>>>>> definitely
> >>>>>>>>>>> for
> >>>>>>>>>>>>> it.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
> >>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
> >>>>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
> >>>>>>>>>>>>>>> `WindowStoreIterator`.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
>

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks Bruno.

I updated the KIP without changing the metric name yet -- want to
discuss this first.



1) I am also ok to keep `all()`


2) That's a good idea. To make sure I understand your suggestion
correctly, below the mapping between metric name an methods.
(Some metric names are use by multiple stores):

(ReadOnly)KeyValueStore:

>> value-by-key : ReadOnlyKeyValueStore#get(K key)

-> replaces `get` metric

I am actually not sure if I like this. Just keeping `get` instead of
`value-by-key`, `value-by-key-and-time` seems more reasonable to me.



(ReadOnly)WindowStore:

>> value-by-key-and-time : ReadOnlyWindowStore#get(K key, long windowStartTime)

I think a simple `get` might be better

>> range-by-key-and-time-range : ReadOnlyWindowStore#range(K key, Instant from, Instant to)
>> range-by-key-and-time-range : WindowStore#range(K key, long fromTime, long toTime)

>> range-by-key-range-and-time-range : ReadOnlyWindowStore#range(K from, K to, Instant from, Instant to)
>> range-by-key-range-and-time-range : WindowStore#range(K from, K to, long fromTime, long toTime)

>> range-by-time-range : ReadOnlyWindowStore#range(Instant from, Instant to) 
>> range-by-time-range : WindowStore#range(long fromTime, long toTime)

>> range-all : ReadOnlyWindowStore#all()


(ReadOnly)SessionStore:

>> range-by-key : ReadOnlySessionStore#range(K key)

>> range-by-key-range : ReadOnlySessionStore#range(K from, K to)
>> range-by-key-and-time-range : SessionStore#range(K key, long earliestSessionEndTime, long latestSessionStartTime)

>> range-by-key-range-and-time-range : SessionStore#range(K keyFrom, K keyTo, long earliestSessionEndTime, long latestSessionStartTime)

>> value-by-key-and-time : SessionStore#get(K key, long sessionStartTime, long sessionEndTime)

I think a simple `get` might be better

>> range-all : ReadOnlyKeyValueStore#all()




3) the `state-` part is already contained in `[storeType]` do I think
it's correct as-is


4) Ack. Fixed.


5) I think `stream` (plural) is correct. Cf
https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/internals/MeteredKeyValueStore.java#L87



-Matthias


On 5/28/19 3:26 AM, Bruno Cadonna wrote:
> Hi all,
> 
> My comments on this KIP:
> 
> 1. I would use `all()` instead of `range()` because the functionality
> is immediately clear without the need to look at the parameter list.
> 
> 2. I would decouple method names from metrics name, because this
> allows us to change one naming independently from the other in the
> future. From the previous discussion on this thread, I have also the
> feeling that the naming requirements differ between Java code and
> metrics.
> 
> My proposal for the metric names is the following:
> 
> value-by-key
> range-by-key
> value-by-key-and-time
> range-by-key-range
> range-by-time-range
> range-by-key-and-time-range
> range-by-key-range-and-time-range
> range-all
> 
> I omitted the metric types like latency-avg, etc. The first word
> denotes whether the query is a point query (`value`) or a range query
> (`range`). The words after the `by` denote the parameter types of the
> query. `range-all` is a special case. I hope, I did not miss any.
> We could also prefix the above names with `get-` if you think, it
> would make the names clearer.
> 
> 3. Should `[storeType]-id` be `[storeType]-state-id`?
> 
> 4. Some method signatures in the KIP under comment `// new` have still
> the `@Deprecated` annotation. Copy&Paste error?
> 
> 5. Should `type = stream-[storeType]-metrics` be `type =
> streams-[storeType]-metrics` or do we need to keep `stream` for
> historical/backward-compatibility reasons?
> 
> Best,
> Bruno
> 
> On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <ma...@confluent.io> wrote:
>>
>> Thanks John and Guozhang.
>>
>> I also lean towards different metric names.
>>
>> While updating the KIP, I also realized that the whole store API for
>> window and session store is actually rather inconsistent. Hence, I
>> extended the scope of this KIP to cleanup both interfaces and changed
>> the KIP name to
>>
>>    "Cleanup built-in Store interfaces"
>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
>>
>> I also included a small change to KeyValueStore to align all built-in
>> stores holistically.
>>
>>
>>
>> Looking forward to your feedback.
>>
>>
>>
>> -Matthias
>>
>>
>>
>> On 5/7/19 1:19 AM, Guozhang Wang wrote:
>>> Thanks John.
>>>
>>> I think I'm convinced about not collapsing the metrics measuring for
>>> various function calls. Regarding the possible solution I'm a bit inclined
>>> to just distinguish on the metric names directly over on tags: since our
>>> current metrics naming are a tad messed up anyways, fixing them in one shot
>>> as a breaking change sounds reasonable to me.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:
>>>
>>>> Thanks all (or range? ;) ) for the discussion. Good points all around.
>>>>
>>>> Although I see the allure of naming the metrics the same as the things
>>>> they're measuring, it seems not to be perfect. Seconding Matthias's latter
>>>> thought, I think it's likely you'd want to measure the method calls
>>>> independently, since the different range variants would have wildly
>>>> different characteristics, which could then lead you to want to orient the
>>>> storage differently to support particular use cases.
>>>>
>>>> Pointing out some structural characteristics (I know you all know this
>>>> stuff, I'm just constructing a table for analysis):
>>>> * Java supports method-name overloading. *Different* methods can have the
>>>> same names, distinguished by arg lists; it doesn't change the fact that
>>>> they are different methods.
>>>> * Metrics does not support metric-name overloading, but metric names do
>>>> have some structure we could exploit, if you consider the tags.
>>>>
>>>> It seems to me that there's actually more domain mismatch if we just have
>>>> one metric named "range", since (e.g., in the SessionStore proposal above)
>>>> the Java API has *four* methods named "range".
>>>>
>>>> Two potential solutions I see:
>>>> * hierarchical metric names: "range-single-key-all-time",
>>>> "range-key-range-all-time", "range-single-key-time-range",
>>>> "range-key-range-time-range", maybe with better names... I'm not the best
>>>> at this stuff. Hopefully, you see the point, though... they all start with
>>>> "range", which provides the association to the method, and all have a
>>>> suffix which identifies the overload being measured.
>>>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
>>>> {"variant": "key-range-all-time"}, "range" {"variant":
>>>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
>>>> you could even split the tags up semantically, but my instinct says that
>>>> that would just make it harder to digest the metrics later on.
>>>>
>>>> Just some ideas.
>>>> -John
>>>>
>>>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
>>>> wrote:
>>>>
>>>>> Thanks for the input Guozhang. I was not aware of those dependencies.
>>>>>
>>>>> It might be good to align this KIP with the metrics cleanup. Not sure
>>>>> atm, if we should use different metric names for different overloads,
>>>>> even if those have the same method name?
>>>>>
>>>>> If we rename all method to `range()` and use the same metric name for
>>>>> all, one could argue that this is still fine, because the metric
>>>>> collects how often a range query is executed (regardless of the range
>>>>> itself).
>>>>>
>>>>> On the other hand, this would basically be a "roll up". It could still
>>>>> be valuable to distinguish between single-key-time-range,
>>>>> key-range-time-range, and all-range queries. Users could still aggregate
>>>>> those later if they are not interested in the details, while it's not
>>>>> possible for user to split a pre-aggregated metric into it's component.
>>>>>
>>>>>
>>>>> Input from others might be helpful here, too.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
>>>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
>>>>>> realized there are a bunch of issues on metric names v.s. function
>>>> names,
>>>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
>>>>> etc.
>>>>>> So in that KIP I proposed to make the function name aligned with
>>>> metrics
>>>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
>>>>>> suggest we make this change as part of KIP-444 as well. Note that it
>>>>> means
>>>>>> different functions with the same name `range` will be measured under a
>>>>>> single metric then.
>>>>>>
>>>>>> But still for function named `all` it will be measured under a separate
>>>>>> metric named `all`, so I'm just clarifying with you if that's the
>>>>> intention.
>>>>>>
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
>>>> not
>>>>>>> take any parameters to limit a range and is a good name IMHO. But I am
>>>>>>> not married to keep `all()` and if we think we should rename it, too,
>>>> I
>>>>>>> am fine with it.
>>>>>>>
>>>>>>> Not sure what connection you make to metrics though. Can you
>>>> elaborate?
>>>>>>>
>>>>>>>
>>>>>>> Would be interested to hear others opinions on this, too.
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
>>>>>>>> I like the renaming, since it also aligns with our metrics cleanup
>>>>>>>> (KIP-444) which touches upon the store level metrics as well.
>>>>>>>>
>>>>>>>> One question: you seems still suggesting to keep "all" with the
>>>> current
>>>>>>>> name (and also using a separate metric for it), what's the difference
>>>>>>>> between this one and other "range" functions?
>>>>>>>>
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
>>>> matthias@confluent.io
>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the input.
>>>>>>>>>
>>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>>>> function
>>>>>>>>>>> and the old functions that we want to deprecate / remove right?
>>>> The
>>>>>>>>>
>>>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
>>>>> which
>>>>>>>>> we want to change the return type.
>>>>>>>>>
>>>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
>>>>> we
>>>>>>>>> could either only deprecate those fetch methods that return
>>>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
>>>>>>>>>
>>>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
>>>>>>>>> consider all methods that are called "fetch":
>>>>>>>>>
>>>>>>>>> ReadOnlyWindowStore:
>>>>>>>>>
>>>>>>>>>> V fetch(K, long)
>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>>>
>>>>>>>>> WindowStore:
>>>>>>>>>
>>>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
>>>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
>>>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>>>
>>>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
>>>> that
>>>>>>>>> fetch over all keys.
>>>>>>>>>
>>>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
>>>>> rename
>>>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
>>>> to
>>>>>>>>> have range()/rangeAll().
>>>>>>>>>
>>>>>>>>> If we do this, we might consider to rename methods for SessionStore,
>>>>>>>>> too. There is
>>>>>>>>>
>>>>>>>>>> ReadOnlySessionStore#fetch(K)
>>>>>>>>>> ReadOnlySessionStore#fetch(K, K)
>>>>>>>>>> SessionStore#findSessions(K, long, long)
>>>>>>>>>> SessionStore#findSessions(K, K, long, long)
>>>>>>>>>> SessionStore#fetchSession(K, long, long);
>>>>>>>>>
>>>>>>>>> Consistent renaming might be:
>>>>>>>>>
>>>>>>>>> ReadOnlySessionStore#range(K)
>>>>>>>>> ReadOnlySessionStore#range(K, K)
>>>>>>>>> SessionStore#range(K, long, long)
>>>>>>>>> SessionStore#range(K, K, long, long)
>>>>>>>>> SessionStore#get(K, long, long);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Not sure if extensive renaming might have too big of an impact.
>>>>> However,
>>>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
>>>>>>>>> thus, it might also provide some additional value.
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
>>>>>>>>>> Hello Matthias,
>>>>>>>>>>
>>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>>>> function
>>>>>>>>>> and the old functions that we want to deprecate / remove right? The
>>>>>>>>>> existing ones have different signatures with parameters so that
>>>> they
>>>>>>>>> should
>>>>>>>>>> not have conflicts.
>>>>>>>>>>
>>>>>>>>>> I was thinking about just make the change directly without
>>>>> deprecating
>>>>>>>>>> existing ones which would require users of 2.3 to make code changes
>>>>> --
>>>>>>>>> this
>>>>>>>>>> code change looks reasonably straight-forward to me and not much
>>>>> worth
>>>>>>>>>> deferring to later when the deprecated ones are removed.
>>>>>>>>>>
>>>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
>>>> new
>>>>>>>>>> functions seems not very useful for users either since it is only
>>>>> used
>>>>>>> as
>>>>>>>>>> an indicator but users cannot make code changes during this phase
>>>>>>>>> anyways,
>>>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
>>>>>>> deprecated
>>>>>>>>>> ones and add the new one.
>>>>>>>>>>
>>>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
>>>>> with
>>>>>>>>> new
>>>>>>>>>> functions in one release, but if people have a good idea of the
>>>>>>> renaming
>>>>>>>>>> approach (I do not have a good one on top of my head :) I can also
>>>> be
>>>>>>>>>> convinced that way.
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
>>>>>>> matthias@confluent.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I am open to change the return type to
>>>>>>>>>>>
>>>>>>>>>>> KeyValueIterator<Windowed<K>, V>
>>>>>>>>>>>
>>>>>>>>>>> However, this requires to rename
>>>>>>>>>>>
>>>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
>>>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
>>>>>>>>>>>
>>>>>>>>>>> to avoid naming conflicts.
>>>>>>>>>>>
>>>>>>>>>>> What new name would you suggest? The existing methods are called
>>>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
>>>>>>>>>>>
>>>>>>>>>>> While I think it would be good to get fully aligned return types,
>>>> I
>>>>> am
>>>>>>>>>>> not sure how we can get aligned method names (without renaming all
>>>>> of
>>>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
>>>>>>> up, I
>>>>>>>>>>> am no opposed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thoughts?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Matthias
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
>>>>>>>>>>>> I was thinking about changing the return type even, to
>>>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
>>>>>>> about
>>>>>>>>>>> the
>>>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
>>>> clearly
>>>>>>>>>>> either),
>>>>>>>>>>>> note it is not backward compatible at all.
>>>>>>>>>>>>
>>>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
>>>>> that
>>>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
>>>>>>> people
>>>>>>>>>>> felt
>>>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
>>>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
>>>>>>>>>>> sophie@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I remember thinking this while working on window stores, am
>>>>>>> definitely
>>>>>>>>>>> for
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
>>>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
>>>>>>>>>>>>>>> `WindowStoreIterator`.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>


Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by Bruno Cadonna <br...@confluent.io>.
Hi all,

My comments on this KIP:

1. I would use `all()` instead of `range()` because the functionality
is immediately clear without the need to look at the parameter list.

2. I would decouple method names from metrics name, because this
allows us to change one naming independently from the other in the
future. From the previous discussion on this thread, I have also the
feeling that the naming requirements differ between Java code and
metrics.

My proposal for the metric names is the following:

value-by-key
range-by-key
value-by-key-and-time
range-by-key-range
range-by-time-range
range-by-key-and-time-range
range-by-key-range-and-time-range
range-all

I omitted the metric types like latency-avg, etc. The first word
denotes whether the query is a point query (`value`) or a range query
(`range`). The words after the `by` denote the parameter types of the
query. `range-all` is a special case. I hope, I did not miss any.
We could also prefix the above names with `get-` if you think, it
would make the names clearer.

3. Should `[storeType]-id` be `[storeType]-state-id`?

4. Some method signatures in the KIP under comment `// new` have still
the `@Deprecated` annotation. Copy&Paste error?

5. Should `type = stream-[storeType]-metrics` be `type =
streams-[storeType]-metrics` or do we need to keep `stream` for
historical/backward-compatibility reasons?

Best,
Bruno

On Fri, May 24, 2019 at 5:46 AM Matthias J. Sax <ma...@confluent.io> wrote:
>
> Thanks John and Guozhang.
>
> I also lean towards different metric names.
>
> While updating the KIP, I also realized that the whole store API for
> window and session store is actually rather inconsistent. Hence, I
> extended the scope of this KIP to cleanup both interfaces and changed
> the KIP name to
>
>    "Cleanup built-in Store interfaces"
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198
>
> I also included a small change to KeyValueStore to align all built-in
> stores holistically.
>
>
>
> Looking forward to your feedback.
>
>
>
> -Matthias
>
>
>
> On 5/7/19 1:19 AM, Guozhang Wang wrote:
> > Thanks John.
> >
> > I think I'm convinced about not collapsing the metrics measuring for
> > various function calls. Regarding the possible solution I'm a bit inclined
> > to just distinguish on the metric names directly over on tags: since our
> > current metrics naming are a tad messed up anyways, fixing them in one shot
> > as a breaking change sounds reasonable to me.
> >
> >
> > Guozhang
> >
> >
> > On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:
> >
> >> Thanks all (or range? ;) ) for the discussion. Good points all around.
> >>
> >> Although I see the allure of naming the metrics the same as the things
> >> they're measuring, it seems not to be perfect. Seconding Matthias's latter
> >> thought, I think it's likely you'd want to measure the method calls
> >> independently, since the different range variants would have wildly
> >> different characteristics, which could then lead you to want to orient the
> >> storage differently to support particular use cases.
> >>
> >> Pointing out some structural characteristics (I know you all know this
> >> stuff, I'm just constructing a table for analysis):
> >> * Java supports method-name overloading. *Different* methods can have the
> >> same names, distinguished by arg lists; it doesn't change the fact that
> >> they are different methods.
> >> * Metrics does not support metric-name overloading, but metric names do
> >> have some structure we could exploit, if you consider the tags.
> >>
> >> It seems to me that there's actually more domain mismatch if we just have
> >> one metric named "range", since (e.g., in the SessionStore proposal above)
> >> the Java API has *four* methods named "range".
> >>
> >> Two potential solutions I see:
> >> * hierarchical metric names: "range-single-key-all-time",
> >> "range-key-range-all-time", "range-single-key-time-range",
> >> "range-key-range-time-range", maybe with better names... I'm not the best
> >> at this stuff. Hopefully, you see the point, though... they all start with
> >> "range", which provides the association to the method, and all have a
> >> suffix which identifies the overload being measured.
> >> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
> >> {"variant": "key-range-all-time"}, "range" {"variant":
> >> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
> >> you could even split the tags up semantically, but my instinct says that
> >> that would just make it harder to digest the metrics later on.
> >>
> >> Just some ideas.
> >> -John
> >>
> >> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
> >> wrote:
> >>
> >>> Thanks for the input Guozhang. I was not aware of those dependencies.
> >>>
> >>> It might be good to align this KIP with the metrics cleanup. Not sure
> >>> atm, if we should use different metric names for different overloads,
> >>> even if those have the same method name?
> >>>
> >>> If we rename all method to `range()` and use the same metric name for
> >>> all, one could argue that this is still fine, because the metric
> >>> collects how often a range query is executed (regardless of the range
> >>> itself).
> >>>
> >>> On the other hand, this would basically be a "roll up". It could still
> >>> be valuable to distinguish between single-key-time-range,
> >>> key-range-time-range, and all-range queries. Users could still aggregate
> >>> those later if they are not interested in the details, while it's not
> >>> possible for user to split a pre-aggregated metric into it's component.
> >>>
> >>>
> >>> Input from others might be helpful here, too.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
> >>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
> >>>> realized there are a bunch of issues on metric names v.s. function
> >> names,
> >>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
> >>> etc.
> >>>> So in that KIP I proposed to make the function name aligned with
> >> metrics
> >>>> name. So suppose we rename the functions from `fetch` to `range` I'd
> >>>> suggest we make this change as part of KIP-444 as well. Note that it
> >>> means
> >>>> different functions with the same name `range` will be measured under a
> >>>> single metric then.
> >>>>
> >>>> But still for function named `all` it will be measured under a separate
> >>>> metric named `all`, so I'm just clarifying with you if that's the
> >>> intention.
> >>>>
> >>>>
> >>>> Guozhang
> >>>>
> >>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
> >>>
> >>>> wrote:
> >>>>
> >>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
> >> not
> >>>>> take any parameters to limit a range and is a good name IMHO. But I am
> >>>>> not married to keep `all()` and if we think we should rename it, too,
> >> I
> >>>>> am fine with it.
> >>>>>
> >>>>> Not sure what connection you make to metrics though. Can you
> >> elaborate?
> >>>>>
> >>>>>
> >>>>> Would be interested to hear others opinions on this, too.
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
> >>>>>> I like the renaming, since it also aligns with our metrics cleanup
> >>>>>> (KIP-444) which touches upon the store level metrics as well.
> >>>>>>
> >>>>>> One question: you seems still suggesting to keep "all" with the
> >> current
> >>>>>> name (and also using a separate metric for it), what's the difference
> >>>>>> between this one and other "range" functions?
> >>>>>>
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
> >> matthias@confluent.io
> >>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Thanks for the input.
> >>>>>>>
> >>>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>> function
> >>>>>>>>> and the old functions that we want to deprecate / remove right?
> >> The
> >>>>>>>
> >>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
> >>> which
> >>>>>>> we want to change the return type.
> >>>>>>>
> >>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
> >>> we
> >>>>>>> could either only deprecate those fetch methods that return
> >>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
> >>>>>>>
> >>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
> >>>>>>> consider all methods that are called "fetch":
> >>>>>>>
> >>>>>>> ReadOnlyWindowStore:
> >>>>>>>
> >>>>>>>> V fetch(K, long)
> >>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
> >>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>
> >>>>>>> WindowStore:
> >>>>>>>
> >>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
> >>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
> >>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
> >>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> >>>>>>>
> >>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
> >> that
> >>>>>>> fetch over all keys.
> >>>>>>>
> >>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
> >>> rename
> >>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
> >> to
> >>>>>>> have range()/rangeAll().
> >>>>>>>
> >>>>>>> If we do this, we might consider to rename methods for SessionStore,
> >>>>>>> too. There is
> >>>>>>>
> >>>>>>>> ReadOnlySessionStore#fetch(K)
> >>>>>>>> ReadOnlySessionStore#fetch(K, K)
> >>>>>>>> SessionStore#findSessions(K, long, long)
> >>>>>>>> SessionStore#findSessions(K, K, long, long)
> >>>>>>>> SessionStore#fetchSession(K, long, long);
> >>>>>>>
> >>>>>>> Consistent renaming might be:
> >>>>>>>
> >>>>>>> ReadOnlySessionStore#range(K)
> >>>>>>> ReadOnlySessionStore#range(K, K)
> >>>>>>> SessionStore#range(K, long, long)
> >>>>>>> SessionStore#range(K, K, long, long)
> >>>>>>> SessionStore#get(K, long, long);
> >>>>>>>
> >>>>>>>
> >>>>>>> Not sure if extensive renaming might have too big of an impact.
> >>> However,
> >>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
> >>>>>>> thus, it might also provide some additional value.
> >>>>>>>
> >>>>>>> Thoughts?
> >>>>>>>
> >>>>>>>
> >>>>>>> -Matthias
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
> >>>>>>>> Hello Matthias,
> >>>>>>>>
> >>>>>>>> Just to clarify the naming conflicts is between the newly added
> >>>>> function
> >>>>>>>> and the old functions that we want to deprecate / remove right? The
> >>>>>>>> existing ones have different signatures with parameters so that
> >> they
> >>>>>>> should
> >>>>>>>> not have conflicts.
> >>>>>>>>
> >>>>>>>> I was thinking about just make the change directly without
> >>> deprecating
> >>>>>>>> existing ones which would require users of 2.3 to make code changes
> >>> --
> >>>>>>> this
> >>>>>>>> code change looks reasonably straight-forward to me and not much
> >>> worth
> >>>>>>>> deferring to later when the deprecated ones are removed.
> >>>>>>>>
> >>>>>>>> On the other hand, just deprecating "WindowIterator" without add
> >> new
> >>>>>>>> functions seems not very useful for users either since it is only
> >>> used
> >>>>> as
> >>>>>>>> an indicator but users cannot make code changes during this phase
> >>>>>>> anyways,
> >>>>>>>> so it is still a `one-cut` deal when we eventually remove the
> >>>>> deprecated
> >>>>>>>> ones and add the new one.
> >>>>>>>>
> >>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
> >>> with
> >>>>>>> new
> >>>>>>>> functions in one release, but if people have a good idea of the
> >>>>> renaming
> >>>>>>>> approach (I do not have a good one on top of my head :) I can also
> >> be
> >>>>>>>> convinced that way.
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
> >>>>> matthias@confluent.io>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I am open to change the return type to
> >>>>>>>>>
> >>>>>>>>> KeyValueIterator<Windowed<K>, V>
> >>>>>>>>>
> >>>>>>>>> However, this requires to rename
> >>>>>>>>>
> >>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
> >>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
> >>>>>>>>>
> >>>>>>>>> to avoid naming conflicts.
> >>>>>>>>>
> >>>>>>>>> What new name would you suggest? The existing methods are called
> >>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
> >>>>>>>>>
> >>>>>>>>> While I think it would be good to get fully aligned return types,
> >> I
> >>> am
> >>>>>>>>> not sure how we can get aligned method names (without renaming all
> >>> of
> >>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
> >>>>> up, I
> >>>>>>>>> am no opposed.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -Matthias
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
> >>>>>>>>>> I was thinking about changing the return type even, to
> >>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
> >>>>> about
> >>>>>>>>> the
> >>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
> >> clearly
> >>>>>>>>> either),
> >>>>>>>>>> note it is not backward compatible at all.
> >>>>>>>>>>
> >>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
> >>> that
> >>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
> >>>>> people
> >>>>>>>>> felt
> >>>>>>>>>> it is too intrusive for compatibility I can be convinced with
> >>>>>>>>>> `KeyValueIterator<Long, V>` as well.
> >>>>>>>>>>
> >>>>>>>>>> Guozhang
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
> >>>>>>>>> sophie@confluent.io>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> I remember thinking this while working on window stores, am
> >>>>> definitely
> >>>>>>>>> for
> >>>>>>>>>>> it.
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
> >>>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
> >>>>>>>>>>>> -John
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
> >>>>>>>>> matthias@confluent.io>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
> >>>>>>>>>>>>> `WindowStoreIterator`.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
>

Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Thanks John and Guozhang.

I also lean towards different metric names.

While updating the KIP, I also realized that the whole store API for
window and session store is actually rather inconsistent. Hence, I
extended the scope of this KIP to cleanup both interfaces and changed
the KIP name to

   "Cleanup built-in Store interfaces"

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103094198

I also included a small change to KeyValueStore to align all built-in
stores holistically.



Looking forward to your feedback.



-Matthias



On 5/7/19 1:19 AM, Guozhang Wang wrote:
> Thanks John.
> 
> I think I'm convinced about not collapsing the metrics measuring for
> various function calls. Regarding the possible solution I'm a bit inclined
> to just distinguish on the metric names directly over on tags: since our
> current metrics naming are a tad messed up anyways, fixing them in one shot
> as a breaking change sounds reasonable to me.
> 
> 
> Guozhang
> 
> 
> On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:
> 
>> Thanks all (or range? ;) ) for the discussion. Good points all around.
>>
>> Although I see the allure of naming the metrics the same as the things
>> they're measuring, it seems not to be perfect. Seconding Matthias's latter
>> thought, I think it's likely you'd want to measure the method calls
>> independently, since the different range variants would have wildly
>> different characteristics, which could then lead you to want to orient the
>> storage differently to support particular use cases.
>>
>> Pointing out some structural characteristics (I know you all know this
>> stuff, I'm just constructing a table for analysis):
>> * Java supports method-name overloading. *Different* methods can have the
>> same names, distinguished by arg lists; it doesn't change the fact that
>> they are different methods.
>> * Metrics does not support metric-name overloading, but metric names do
>> have some structure we could exploit, if you consider the tags.
>>
>> It seems to me that there's actually more domain mismatch if we just have
>> one metric named "range", since (e.g., in the SessionStore proposal above)
>> the Java API has *four* methods named "range".
>>
>> Two potential solutions I see:
>> * hierarchical metric names: "range-single-key-all-time",
>> "range-key-range-all-time", "range-single-key-time-range",
>> "range-key-range-time-range", maybe with better names... I'm not the best
>> at this stuff. Hopefully, you see the point, though... they all start with
>> "range", which provides the association to the method, and all have a
>> suffix which identifies the overload being measured.
>> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
>> {"variant": "key-range-all-time"}, "range" {"variant":
>> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
>> you could even split the tags up semantically, but my instinct says that
>> that would just make it harder to digest the metrics later on.
>>
>> Just some ideas.
>> -John
>>
>> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
>> wrote:
>>
>>> Thanks for the input Guozhang. I was not aware of those dependencies.
>>>
>>> It might be good to align this KIP with the metrics cleanup. Not sure
>>> atm, if we should use different metric names for different overloads,
>>> even if those have the same method name?
>>>
>>> If we rename all method to `range()` and use the same metric name for
>>> all, one could argue that this is still fine, because the metric
>>> collects how often a range query is executed (regardless of the range
>>> itself).
>>>
>>> On the other hand, this would basically be a "roll up". It could still
>>> be valuable to distinguish between single-key-time-range,
>>> key-range-time-range, and all-range queries. Users could still aggregate
>>> those later if they are not interested in the details, while it's not
>>> possible for user to split a pre-aggregated metric into it's component.
>>>
>>>
>>> Input from others might be helpful here, too.
>>>
>>>
>>> -Matthias
>>>
>>> On 4/11/19 6:00 PM, Guozhang Wang wrote:
>>>> While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
>>>> realized there are a bunch of issues on metric names v.s. function
>> names,
>>>> e.g. some function named `fetchAll` are actually measure with `fetch`,
>>> etc.
>>>> So in that KIP I proposed to make the function name aligned with
>> metrics
>>>> name. So suppose we rename the functions from `fetch` to `range` I'd
>>>> suggest we make this change as part of KIP-444 as well. Note that it
>>> means
>>>> different functions with the same name `range` will be measured under a
>>>> single metric then.
>>>>
>>>> But still for function named `all` it will be measured under a separate
>>>> metric named `all`, so I'm just clarifying with you if that's the
>>> intention.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>> On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
>>>
>>>> wrote:
>>>>
>>>>> I did not see a reason to rename `all()` to `range()`. `all()` does
>> not
>>>>> take any parameters to limit a range and is a good name IMHO. But I am
>>>>> not married to keep `all()` and if we think we should rename it, too,
>> I
>>>>> am fine with it.
>>>>>
>>>>> Not sure what connection you make to metrics though. Can you
>> elaborate?
>>>>>
>>>>>
>>>>> Would be interested to hear others opinions on this, too.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 4/11/19 8:38 AM, Guozhang Wang wrote:
>>>>>> I like the renaming, since it also aligns with our metrics cleanup
>>>>>> (KIP-444) which touches upon the store level metrics as well.
>>>>>>
>>>>>> One question: you seems still suggesting to keep "all" with the
>> current
>>>>>> name (and also using a separate metric for it), what's the difference
>>>>>> between this one and other "range" functions?
>>>>>>
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
>> matthias@confluent.io
>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the input.
>>>>>>>
>>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>> function
>>>>>>>>> and the old functions that we want to deprecate / remove right?
>> The
>>>>>>>
>>>>>>> Yes, the conflict is just fort the existing `fetch()` methods for
>>> which
>>>>>>> we want to change the return type.
>>>>>>>
>>>>>>> IMHO, we should not make a breaking change in a minor release. Thus,
>>> we
>>>>>>> could either only deprecate those fetch methods that return
>>>>>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
>>>>>>>
>>>>>>> Or we, follow the renaming path. No get a clean renaming, we need to
>>>>>>> consider all methods that are called "fetch":
>>>>>>>
>>>>>>> ReadOnlyWindowStore:
>>>>>>>
>>>>>>>> V fetch(K, long)
>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>
>>>>>>> WindowStore:
>>>>>>>
>>>>>>>> WindowStoreIterator<V> fetch(K, long, long)
>>>>>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
>>>>>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
>>>>>>>
>>>>>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
>> that
>>>>>>> fetch over all keys.
>>>>>>>
>>>>>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
>>> rename
>>>>>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
>> to
>>>>>>> have range()/rangeAll().
>>>>>>>
>>>>>>> If we do this, we might consider to rename methods for SessionStore,
>>>>>>> too. There is
>>>>>>>
>>>>>>>> ReadOnlySessionStore#fetch(K)
>>>>>>>> ReadOnlySessionStore#fetch(K, K)
>>>>>>>> SessionStore#findSessions(K, long, long)
>>>>>>>> SessionStore#findSessions(K, K, long, long)
>>>>>>>> SessionStore#fetchSession(K, long, long);
>>>>>>>
>>>>>>> Consistent renaming might be:
>>>>>>>
>>>>>>> ReadOnlySessionStore#range(K)
>>>>>>> ReadOnlySessionStore#range(K, K)
>>>>>>> SessionStore#range(K, long, long)
>>>>>>> SessionStore#range(K, K, long, long)
>>>>>>> SessionStore#get(K, long, long);
>>>>>>>
>>>>>>>
>>>>>>> Not sure if extensive renaming might have too big of an impact.
>>> However,
>>>>>>> `range()` and `get()` might actually be better names than `fetch()`,
>>>>>>> thus, it might also provide some additional value.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>>
>>>>>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
>>>>>>>> Hello Matthias,
>>>>>>>>
>>>>>>>> Just to clarify the naming conflicts is between the newly added
>>>>> function
>>>>>>>> and the old functions that we want to deprecate / remove right? The
>>>>>>>> existing ones have different signatures with parameters so that
>> they
>>>>>>> should
>>>>>>>> not have conflicts.
>>>>>>>>
>>>>>>>> I was thinking about just make the change directly without
>>> deprecating
>>>>>>>> existing ones which would require users of 2.3 to make code changes
>>> --
>>>>>>> this
>>>>>>>> code change looks reasonably straight-forward to me and not much
>>> worth
>>>>>>>> deferring to later when the deprecated ones are removed.
>>>>>>>>
>>>>>>>> On the other hand, just deprecating "WindowIterator" without add
>> new
>>>>>>>> functions seems not very useful for users either since it is only
>>> used
>>>>> as
>>>>>>>> an indicator but users cannot make code changes during this phase
>>>>>>> anyways,
>>>>>>>> so it is still a `one-cut` deal when we eventually remove the
>>>>> deprecated
>>>>>>>> ones and add the new one.
>>>>>>>>
>>>>>>>> Hence I'm slightly inclining to trade compatibility and replace it
>>> with
>>>>>>> new
>>>>>>>> functions in one release, but if people have a good idea of the
>>>>> renaming
>>>>>>>> approach (I do not have a good one on top of my head :) I can also
>> be
>>>>>>>> convinced that way.
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
>>>>> matthias@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I am open to change the return type to
>>>>>>>>>
>>>>>>>>> KeyValueIterator<Windowed<K>, V>
>>>>>>>>>
>>>>>>>>> However, this requires to rename
>>>>>>>>>
>>>>>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
>>>>>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
>>>>>>>>>
>>>>>>>>> to avoid naming conflicts.
>>>>>>>>>
>>>>>>>>> What new name would you suggest? The existing methods are called
>>>>>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
>>>>>>>>>
>>>>>>>>> While I think it would be good to get fully aligned return types,
>> I
>>> am
>>>>>>>>> not sure how we can get aligned method names (without renaming all
>>> of
>>>>>>>>> them...)? If we think it's worth to rename all to get this cleaned
>>>>> up, I
>>>>>>>>> am no opposed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thoughts?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
>>>>>>>>>> I was thinking about changing the return type even, to
>>>>>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
>>>>> about
>>>>>>>>> the
>>>>>>>>>> key typed `Long` (Streams javadoc today did not explain it
>> clearly
>>>>>>>>> either),
>>>>>>>>>> note it is not backward compatible at all.
>>>>>>>>>>
>>>>>>>>>> Personally I'd prefer to just deprecate the API and new new ones
>>> that
>>>>>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
>>>>> people
>>>>>>>>> felt
>>>>>>>>>> it is too intrusive for compatibility I can be convinced with
>>>>>>>>>> `KeyValueIterator<Long, V>` as well.
>>>>>>>>>>
>>>>>>>>>> Guozhang
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
>>>>>>>>> sophie@confluent.io>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I remember thinking this while working on window stores, am
>>>>> definitely
>>>>>>>>> for
>>>>>>>>>>> it.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
>>>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Sounds great to me. Thanks, Matthias!
>>>>>>>>>>>> -John
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
>>>>>>>>> matthias@confluent.io>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to propose KIP-439 to deprecate interface
>>>>>>>>>>>>> `WindowStoreIterator`.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
>>>>>>>>>>>>>
>>>>>>>>>>>>> Looking forward to your feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
> 
> 


Re: [DISCUSS] KIP-439: Deprecate Interface WindowStoreIterator

Posted by Guozhang Wang <wa...@gmail.com>.
Thanks John.

I think I'm convinced about not collapsing the metrics measuring for
various function calls. Regarding the possible solution I'm a bit inclined
to just distinguish on the metric names directly over on tags: since our
current metrics naming are a tad messed up anyways, fixing them in one shot
as a breaking change sounds reasonable to me.


Guozhang


On Mon, May 6, 2019 at 9:14 AM John Roesler <jo...@confluent.io> wrote:

> Thanks all (or range? ;) ) for the discussion. Good points all around.
>
> Although I see the allure of naming the metrics the same as the things
> they're measuring, it seems not to be perfect. Seconding Matthias's latter
> thought, I think it's likely you'd want to measure the method calls
> independently, since the different range variants would have wildly
> different characteristics, which could then lead you to want to orient the
> storage differently to support particular use cases.
>
> Pointing out some structural characteristics (I know you all know this
> stuff, I'm just constructing a table for analysis):
> * Java supports method-name overloading. *Different* methods can have the
> same names, distinguished by arg lists; it doesn't change the fact that
> they are different methods.
> * Metrics does not support metric-name overloading, but metric names do
> have some structure we could exploit, if you consider the tags.
>
> It seems to me that there's actually more domain mismatch if we just have
> one metric named "range", since (e.g., in the SessionStore proposal above)
> the Java API has *four* methods named "range".
>
> Two potential solutions I see:
> * hierarchical metric names: "range-single-key-all-time",
> "range-key-range-all-time", "range-single-key-time-range",
> "range-key-range-time-range", maybe with better names... I'm not the best
> at this stuff. Hopefully, you see the point, though... they all start with
> "range", which provides the association to the method, and all have a
> suffix which identifies the overload being measured.
> * tagged metric names: "range" {"variant": "single-key-all-time"}, "range"
> {"variant": "key-range-all-time"}, "range" {"variant":
> "single-key-time-range"}, "range" {"variant": "key-range-time-range"} . Or
> you could even split the tags up semantically, but my instinct says that
> that would just make it harder to digest the metrics later on.
>
> Just some ideas.
> -John
>
> On Fri, Apr 26, 2019 at 3:51 AM Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > Thanks for the input Guozhang. I was not aware of those dependencies.
> >
> > It might be good to align this KIP with the metrics cleanup. Not sure
> > atm, if we should use different metric names for different overloads,
> > even if those have the same method name?
> >
> > If we rename all method to `range()` and use the same metric name for
> > all, one could argue that this is still fine, because the metric
> > collects how often a range query is executed (regardless of the range
> > itself).
> >
> > On the other hand, this would basically be a "roll up". It could still
> > be valuable to distinguish between single-key-time-range,
> > key-range-time-range, and all-range queries. Users could still aggregate
> > those later if they are not interested in the details, while it's not
> > possible for user to split a pre-aggregated metric into it's component.
> >
> >
> > Input from others might be helpful here, too.
> >
> >
> > -Matthias
> >
> > On 4/11/19 6:00 PM, Guozhang Wang wrote:
> > > While working at KIP-444 (https://github.com/apache/kafka/pull/6498) I
> > > realized there are a bunch of issues on metric names v.s. function
> names,
> > > e.g. some function named `fetchAll` are actually measure with `fetch`,
> > etc.
> > > So in that KIP I proposed to make the function name aligned with
> metrics
> > > name. So suppose we rename the functions from `fetch` to `range` I'd
> > > suggest we make this change as part of KIP-444 as well. Note that it
> > means
> > > different functions with the same name `range` will be measured under a
> > > single metric then.
> > >
> > > But still for function named `all` it will be measured under a separate
> > > metric named `all`, so I'm just clarifying with you if that's the
> > intention.
> > >
> > >
> > > Guozhang
> > >
> > > On Thu, Apr 11, 2019 at 2:04 PM Matthias J. Sax <matthias@confluent.io
> >
> > > wrote:
> > >
> > >> I did not see a reason to rename `all()` to `range()`. `all()` does
> not
> > >> take any parameters to limit a range and is a good name IMHO. But I am
> > >> not married to keep `all()` and if we think we should rename it, too,
> I
> > >> am fine with it.
> > >>
> > >> Not sure what connection you make to metrics though. Can you
> elaborate?
> > >>
> > >>
> > >> Would be interested to hear others opinions on this, too.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 4/11/19 8:38 AM, Guozhang Wang wrote:
> > >>> I like the renaming, since it also aligns with our metrics cleanup
> > >>> (KIP-444) which touches upon the store level metrics as well.
> > >>>
> > >>> One question: you seems still suggesting to keep "all" with the
> current
> > >>> name (and also using a separate metric for it), what's the difference
> > >>> between this one and other "range" functions?
> > >>>
> > >>>
> > >>> Guozhang
> > >>>
> > >>> On Thu, Apr 11, 2019 at 2:26 AM Matthias J. Sax <
> matthias@confluent.io
> > >
> > >>> wrote:
> > >>>
> > >>>> Thanks for the input.
> > >>>>
> > >>>>>> Just to clarify the naming conflicts is between the newly added
> > >> function
> > >>>>>> and the old functions that we want to deprecate / remove right?
> The
> > >>>>
> > >>>> Yes, the conflict is just fort the existing `fetch()` methods for
> > which
> > >>>> we want to change the return type.
> > >>>>
> > >>>> IMHO, we should not make a breaking change in a minor release. Thus,
> > we
> > >>>> could either only deprecate those fetch methods that return
> > >>>> `WindowStoreIterator` and do a "clean cut" in 3.0.
> > >>>>
> > >>>> Or we, follow the renaming path. No get a clean renaming, we need to
> > >>>> consider all methods that are called "fetch":
> > >>>>
> > >>>> ReadOnlyWindowStore:
> > >>>>
> > >>>>> V fetch(K, long)
> > >>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)
> > >>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> > >>>>
> > >>>> WindowStore:
> > >>>>
> > >>>>> WindowStoreIterator<V> fetch(K, long, long)
> > >>>>> WindowStoreIterator<V> fetch(K, Instant, Instant)>
> > >>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, long, long)
> > >>>>> KeyValueIterator<Windowed<K>, V> fetch(K, K, Instant, Instant)
> > >>>>
> > >>>> There is also fetchAll(long, long) and fetchAll(Instant, Instant)
> that
> > >>>> fetch over all keys.
> > >>>>
> > >>>> Maybe we could rename `V fetch(K, long)` to `V get(K, long)` and
> > rename
> > >>>> all `fetch()/fetchAll()` to `range()`? There is actually no reason
> to
> > >>>> have range()/rangeAll().
> > >>>>
> > >>>> If we do this, we might consider to rename methods for SessionStore,
> > >>>> too. There is
> > >>>>
> > >>>>> ReadOnlySessionStore#fetch(K)
> > >>>>> ReadOnlySessionStore#fetch(K, K)
> > >>>>> SessionStore#findSessions(K, long, long)
> > >>>>> SessionStore#findSessions(K, K, long, long)
> > >>>>> SessionStore#fetchSession(K, long, long);
> > >>>>
> > >>>> Consistent renaming might be:
> > >>>>
> > >>>> ReadOnlySessionStore#range(K)
> > >>>> ReadOnlySessionStore#range(K, K)
> > >>>> SessionStore#range(K, long, long)
> > >>>> SessionStore#range(K, K, long, long)
> > >>>> SessionStore#get(K, long, long);
> > >>>>
> > >>>>
> > >>>> Not sure if extensive renaming might have too big of an impact.
> > However,
> > >>>> `range()` and `get()` might actually be better names than `fetch()`,
> > >>>> thus, it might also provide some additional value.
> > >>>>
> > >>>> Thoughts?
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>>>
> > >>>> On 3/27/19 10:19 AM, Guozhang Wang wrote:
> > >>>>> Hello Matthias,
> > >>>>>
> > >>>>> Just to clarify the naming conflicts is between the newly added
> > >> function
> > >>>>> and the old functions that we want to deprecate / remove right? The
> > >>>>> existing ones have different signatures with parameters so that
> they
> > >>>> should
> > >>>>> not have conflicts.
> > >>>>>
> > >>>>> I was thinking about just make the change directly without
> > deprecating
> > >>>>> existing ones which would require users of 2.3 to make code changes
> > --
> > >>>> this
> > >>>>> code change looks reasonably straight-forward to me and not much
> > worth
> > >>>>> deferring to later when the deprecated ones are removed.
> > >>>>>
> > >>>>> On the other hand, just deprecating "WindowIterator" without add
> new
> > >>>>> functions seems not very useful for users either since it is only
> > used
> > >> as
> > >>>>> an indicator but users cannot make code changes during this phase
> > >>>> anyways,
> > >>>>> so it is still a `one-cut` deal when we eventually remove the
> > >> deprecated
> > >>>>> ones and add the new one.
> > >>>>>
> > >>>>> Hence I'm slightly inclining to trade compatibility and replace it
> > with
> > >>>> new
> > >>>>> functions in one release, but if people have a good idea of the
> > >> renaming
> > >>>>> approach (I do not have a good one on top of my head :) I can also
> be
> > >>>>> convinced that way.
> > >>>>>
> > >>>>> Guozhang
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Mar 11, 2019 at 10:41 AM Matthias J. Sax <
> > >> matthias@confluent.io>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> I am open to change the return type to
> > >>>>>>
> > >>>>>> KeyValueIterator<Windowed<K>, V>
> > >>>>>>
> > >>>>>> However, this requires to rename
> > >>>>>>
> > >>>>>> #fetch(K key, long startTimestamp, long endTimestamp)
> > >>>>>> #fetch(K key, Instant startTimestamp, Instant endTimestamp)
> > >>>>>>
> > >>>>>> to avoid naming conflicts.
> > >>>>>>
> > >>>>>> What new name would you suggest? The existing methods are called
> > >>>>>> `fetch()`, `fetchAll()`, `all()`, `put()`.
> > >>>>>>
> > >>>>>> While I think it would be good to get fully aligned return types,
> I
> > am
> > >>>>>> not sure how we can get aligned method names (without renaming all
> > of
> > >>>>>> them...)? If we think it's worth to rename all to get this cleaned
> > >> up, I
> > >>>>>> am no opposed.
> > >>>>>>
> > >>>>>>
> > >>>>>> Thoughts?
> > >>>>>>
> > >>>>>>
> > >>>>>> -Matthias
> > >>>>>>
> > >>>>>>
> > >>>>>> On 3/11/19 10:27 AM, Guozhang Wang wrote:
> > >>>>>>> I was thinking about changing the return type even, to
> > >>>>>>> `KeyValueIterator<Windowed<K>, V>` since it is confusing to users
> > >> about
> > >>>>>> the
> > >>>>>>> key typed `Long` (Streams javadoc today did not explain it
> clearly
> > >>>>>> either),
> > >>>>>>> note it is not backward compatible at all.
> > >>>>>>>
> > >>>>>>> Personally I'd prefer to just deprecate the API and new new ones
> > that
> > >>>>>>> return `KeyValueIterator<Windowed<K>, V>` directly, but if most
> > >> people
> > >>>>>> felt
> > >>>>>>> it is too intrusive for compatibility I can be convinced with
> > >>>>>>> `KeyValueIterator<Long, V>` as well.
> > >>>>>>>
> > >>>>>>> Guozhang
> > >>>>>>>
> > >>>>>>> On Mon, Mar 11, 2019 at 10:17 AM Sophie Blee-Goldman <
> > >>>>>> sophie@confluent.io>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> I remember thinking this while working on window stores, am
> > >> definitely
> > >>>>>> for
> > >>>>>>>> it.
> > >>>>>>>>
> > >>>>>>>> On Mon, Mar 11, 2019 at 9:20 AM John Roesler <john@confluent.io
> >
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Sounds great to me. Thanks, Matthias!
> > >>>>>>>>> -John
> > >>>>>>>>>
> > >>>>>>>>> On Sun, Mar 10, 2019 at 11:58 PM Matthias J. Sax <
> > >>>>>> matthias@confluent.io>
> > >>>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> I would like to propose KIP-439 to deprecate interface
> > >>>>>>>>>> `WindowStoreIterator`.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-439%3A+Deprecate+Interface+WindowStoreIterator
> > >>>>>>>>>>
> > >>>>>>>>>> Looking forward to your feedback.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> -Matthias
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>


-- 
-- Guozhang