You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Bruno Cadonna <br...@confluent.io> on 2020/07/20 18:00:45 UTC

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

Hi,

During the implementation of this KIP and after some discussion about 
RocksDB metrics, I decided to make some major modifications to this KIP 
and kick off discussion again.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB

Best,
Bruno

On 15.05.20 17:11, Bill Bejeck wrote:
> Thanks for the KIP, Bruno. Having sensible, easy to access RocksDB memory
> reporting will be a welcomed addition.
> 
> FWIW I also agree to have the metrics reported on a store level. I'm glad
> you changed the KIP to that effect.
> 
> -Bill
> 
> 
> 
> On Wed, May 13, 2020 at 6:24 PM Guozhang Wang <wa...@gmail.com> wrote:
> 
>> Hi Bruno,
>>
>> Sounds good to me.
>>
>> I think I'm just a bit more curious to see its impact on performance: as
>> long as we have one INFO level rocksDB metrics, then we'd have to turn on
>> the scheduled rocksdb metrics recorder whereas previously, we can decide to
>> not turn on the recorder at all if all are set as DEBUG and we configure at
>> INFO level in production. But this is an implementation detail anyways and
>> maybe the impact is negligible after all. We can check and re-discuss this
>> afterwards :)
>>
>>
>> Guozhang
>>
>>
>> On Wed, May 13, 2020 at 9:34 AM Sophie Blee-Goldman <so...@confluent.io>
>> wrote:
>>
>>> Thanks Bruno! I took a look at the revised KIP and it looks good to me.
>>>
>>> Sophie
>>>
>>> On Wed, May 13, 2020 at 6:59 AM Bruno Cadonna <br...@confluent.io>
>> wrote:
>>>
>>>> Hi John,
>>>>
>>>> Thank you for the feedback!
>>>>
>>>> I agree and I will change the KIP as I stated in my previous e-mail to
>>>> Guozhang.
>>>>
>>>> Best,
>>>> Bruno
>>>>
>>>> On Tue, May 12, 2020 at 3:07 AM John Roesler <vv...@apache.org>
>>> wrote:
>>>>>
>>>>> Thanks, all.
>>>>>
>>>>> If you don’t mind, I’ll pitch in a few cents’ worth.
>>>>>
>>>>> In my life I’ve generally found more granular metrics to be more
>>> useful,
>>>> as long as there’s a sane way to roll them up. It does seem nice to see
>>> it
>>>> on the per-store level. For roll-up purposes, the task and thread tags
>>>> should be sufficient.
>>>>>
>>>>> I think the only reason we make some metrics Debug is that
>> _recording_
>>>> them can be expensive. If there’s no added expense, I think we can just
>>>> register store-level metrics at Info level.
>>>>>
>>>>> Thanks for the KIP, Bruno!
>>>>> -John
>>>>>
>>>>> On Mon, May 11, 2020, at 17:32, Guozhang Wang wrote:
>>>>>> Hello Sophie / Bruno,
>>>>>>
>>>>>> I've also thought about the leveling question, and one motivation I
>>>> had for
>>>>>> setting it in instance-level is that we want to expose it in INFO
>>>> level:
>>>>>> today our report leveling is not very finer grained --- which I
>> think
>>>> is
>>>>>> sth. worth itself --- such that one have to either turn on all
>> DEBUG
>>>>>> metrics recording or none of them. If we can allow users to e.g.
>>>> specify
>>>>>> "turn on streams-metrics and stream-state-metrics, but not others"
>>> and
>>>> then
>>>>>> I think it should be just at store-level. However, right now if we
>>>> want to
>>>>>> set it as store-level then it would be DEBUG and not exposed by
>>>> default.
>>>>>>
>>>>>> So it seems we have several options in addition to the proposed
>> one:
>>>>>>
>>>>>> a) we set it at store-level as INFO; but then one can argue why
>> this
>>> is
>>>>>> INFO while others (bytes-written, etc) are DEBUG.
>>>>>> b) we set it at store-level as DEBUG, believing that we do not
>>> usually
>>>> need
>>>>>> to turn it on.
>>>>>> c) maybe, we can set it at task-level (? I'm not so sure myself
>> about
>>>>>> this.. :P) as INFO.
>>>>>>
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, May 11, 2020 at 12:29 PM Sophie Blee-Goldman <
>>>> sophie@confluent.io>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey Bruno,
>>>>>>>
>>>>>>> Thanks for the KIP! I have one high-level concern, which is that
>> we
>>>> should
>>>>>>> consider
>>>>>>> reporting these metrics on the per-store level rather than
>>>> instance-wide. I
>>>>>>> know I was
>>>>>>> the one who first proposed making it instance-wide, so bear with
>>> me:
>>>>>>>
>>>>>>> While I would still argue that the instance-wide memory usage is
>>>> probably
>>>>>>> the most *useful*,
>>>>>>> exposing them at the store-level does not prevent users from
>>>> monitoring the
>>>>>>> instance-wide
>>>>>>> memory. They should be able to roll up all the store-level
>> metrics
>>>> on an
>>>>>>> instance to
>>>>>>> compute the total off-heap memory. But rolling it up for the
>> users
>>>> does
>>>>>>> prevent them from
>>>>>>> using this to debug rare cases where one store may be using
>>>> significantly
>>>>>>> more memory than
>>>>>>> expected.
>>>>>>>
>>>>>>> It's also worth considering that some users may be using the
>>> bounded
>>>> memory
>>>>>>> config setter
>>>>>>> to put a cap on the off-heap memory of the entire process, in
>> which
>>>> case
>>>>>>> the memory usage
>>>>>>> metric for any one store should reflect the memory usage of the
>>>> entire
>>>>>>> instance. In that case
>>>>>>> any effort to roll up the memory usages ourselves would just be
>>>> wasted.
>>>>>>>
>>>>>>> Sorry for the reversal, but after a second thought I'm pretty
>>>> strongly in
>>>>>>> favor of reporting these
>>>>>>> at the store level.
>>>>>>>
>>>>>>> Best,
>>>>>>> Sophie
>>>>>>>
>>>>>>> On Wed, May 6, 2020 at 8:41 AM Bruno Cadonna <bruno@confluent.io
>>>
>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I'd like to discuss KIP-607 that aims to add RocksDB memory
>> usage
>>>>>>>> metrics to Kafka Streams.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Record+the+Memory+Used+by+RocksDB+to+Kafka+Streams
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Bruno
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -- Guozhang
>>>>>>
>>>>
>>>
>>
>>
>> --
>> -- Guozhang
>>
>

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

Posted by Bruno Cadonna <br...@confluent.io>.

Thank you John for the proposal.

Indeed I did also not like to extend the RocksDBConfigSetter interface, 
but couldn't find a way around it.

I will remove the interface extension from the KIP and try out your 
proposal. I need to look into the details but after a first glance, I 
think we need to subclass BlockBasedTableConfig instead of Options 
(actually, we already subclass Options). I also have to verify if this 
is the only place where the cache is passed along.

Best,
Bruno

On 21.07.20 17:43, John Roesler wrote:
> Thanks for the update, Bruno!
> 
> In addition to Guozhang's feedback, I'm a little concerned
> about the change to the RocksDBConfigSetter. If I understand
> the proposal, people would have to separately supply
> their Cache to the Options parameter in setConfig() and also
> save it in a field so it can be returned in cache(). If they don't
> return the same object, then the metrics won't be accurate,
> but otherwise the mistake will be undetectable. Also, the
> method is defaulted to return null, so existing implementations
> would have no indication that they need to change, except that
> users who want to read the new metrics would see inaccurate
> values. They probably don't have a priori knowledge that would
> let them identify that the reported metrics aren't accurate, and
> even if they do notice something is wrong, it would probably take
> quite a bit of effort to get all the way to the root cause.
> 
> I'm wondering if we can instead avoid the new method and pass
> to the ConfigSetter our own subclass of Options (which is non-final
> and has only public constructors) that would enable us to capture
> and retrieve the Cache later. Or even just use reflection to get the
> Cache out of the existing Options object after calling the ConfigSetter.
> 
> What do you think?
> Thanks again for the update,
> -John
> 
> On Mon, Jul 20, 2020, at 17:39, Guozhang Wang wrote:
>> Hello Bruno,
>>
>> Thanks for the updated KIP. I made a pass and here are some comments:
>>
>> 1) What's the motivation of keeping it as INFO while KIP-471 metrics are
>> defined in DEBUG?
>>
>> 2) Some namings are a bit inconsistent with others and with KIP-471, for
>> example:
>>
>> 2.a) KIP-471 uses "memtable" while in this KIP we use "mem-table", also the
>> "memtable" is prefixed and then the metric name. I'd suggest we keep them
>> consistent. e.g. "num-immutable-mem-table" => "immutable-memtable-count",
>> "cur-size-active-mem-table" => "active-memable-bytes"
>>
>> 2.b) "immutable" are abbreviated as "imm" in some names but not in others,
>> I'd suggest we do not use abbreviations across all names,
>> e.g. "num-entries-imm-mem-tables" => "immutable-memtable-num-entries".
>>
>> 2.c) "-size" "-num" semantics is usually a bit unclear, and I'd suggest we
>> just more concrete terms, e.g. "total-sst-files-size" =>
>> "total-sst-files-bytes", "num-live-versions" => "live-versions-count",
>> "background-errors" => "background-errors-count".
>>
>> 3) Some metrics are a bit confusing, e.g.
>>
>> 3.a) What's the difference between "cur-size-all-mem-tables" and
>> "size-all-mem-tables"?
>>
>> 3.b) And the explanation of "estimate-table-readers-mem" does not read very
>> clear to me either, does it refer to "estimate-sst-file-read-buffer-bytes"?
>>
>> 3.c) How does "estimate-oldest-key-time" help with memory usage debugging?
>>
>> 4) For my own education, does "estimate-pending-compaction-bytes" capture
>> all the memory usage for compaction buffers?
>>
>> 5) This is just of a nit comment to help readers better understand rocksDB:
>> maybe we can explain in the wiki doc which part of rocksDB uses memory
>> (block cache, OS cache, memtable, compaction buffer, read buffer), and
>> which of them are on-heap and wich of them are off-heap, which can be hard
>> bounded and which can only be soft bounded and which cannot be bounded at
>> all, etc.
>>
>>
>> Guozhang
>>
>>
>> On Mon, Jul 20, 2020 at 11:00 AM Bruno Cadonna <br...@confluent.io> wrote:
>>
>>> Hi,
>>>
>>> During the implementation of this KIP and after some discussion about
>>> RocksDB metrics, I decided to make some major modifications to this KIP
>>> and kick off discussion again.
>>>
>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB
>>>
>>> Best,
>>> Bruno
>>>
>>> On 15.05.20 17:11, Bill Bejeck wrote:
>>>> Thanks for the KIP, Bruno. Having sensible, easy to access RocksDB memory
>>>> reporting will be a welcomed addition.
>>>>
>>>> FWIW I also agree to have the metrics reported on a store level. I'm glad
>>>> you changed the KIP to that effect.
>>>>
>>>> -Bill
>>>>
>>>>
>>>>
>>>> On Wed, May 13, 2020 at 6:24 PM Guozhang Wang <wa...@gmail.com>
>>> wrote:
>>>>
>>>>> Hi Bruno,
>>>>>
>>>>> Sounds good to me.
>>>>>
>>>>> I think I'm just a bit more curious to see its impact on performance: as
>>>>> long as we have one INFO level rocksDB metrics, then we'd have to turn
>>> on
>>>>> the scheduled rocksdb metrics recorder whereas previously, we can
>>> decide to
>>>>> not turn on the recorder at all if all are set as DEBUG and we
>>> configure at
>>>>> INFO level in production. But this is an implementation detail anyways
>>> and
>>>>> maybe the impact is negligible after all. We can check and re-discuss
>>> this
>>>>> afterwards :)
>>>>>
>>>>>
>>>>> Guozhang
>>>>>
>>>>>
>>>>> On Wed, May 13, 2020 at 9:34 AM Sophie Blee-Goldman <
>>> sophie@confluent.io>
>>>>> wrote:
>>>>>
>>>>>> Thanks Bruno! I took a look at the revised KIP and it looks good to me.
>>>>>>
>>>>>> Sophie
>>>>>>
>>>>>> On Wed, May 13, 2020 at 6:59 AM Bruno Cadonna <br...@confluent.io>
>>>>> wrote:
>>>>>>
>>>>>>> Hi John,
>>>>>>>
>>>>>>> Thank you for the feedback!
>>>>>>>
>>>>>>> I agree and I will change the KIP as I stated in my previous e-mail to
>>>>>>> Guozhang.
>>>>>>>
>>>>>>> Best,
>>>>>>> Bruno
>>>>>>>
>>>>>>> On Tue, May 12, 2020 at 3:07 AM John Roesler <vv...@apache.org>
>>>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks, all.
>>>>>>>>
>>>>>>>> If you don’t mind, I’ll pitch in a few cents’ worth.
>>>>>>>>
>>>>>>>> In my life I’ve generally found more granular metrics to be more
>>>>>> useful,
>>>>>>> as long as there’s a sane way to roll them up. It does seem nice to
>>> see
>>>>>> it
>>>>>>> on the per-store level. For roll-up purposes, the task and thread tags
>>>>>>> should be sufficient.
>>>>>>>>
>>>>>>>> I think the only reason we make some metrics Debug is that
>>>>> _recording_
>>>>>>> them can be expensive. If there’s no added expense, I think we can
>>> just
>>>>>>> register store-level metrics at Info level.
>>>>>>>>
>>>>>>>> Thanks for the KIP, Bruno!
>>>>>>>> -John
>>>>>>>>
>>>>>>>> On Mon, May 11, 2020, at 17:32, Guozhang Wang wrote:
>>>>>>>>> Hello Sophie / Bruno,
>>>>>>>>>
>>>>>>>>> I've also thought about the leveling question, and one motivation I
>>>>>>> had for
>>>>>>>>> setting it in instance-level is that we want to expose it in INFO
>>>>>>> level:
>>>>>>>>> today our report leveling is not very finer grained --- which I
>>>>> think
>>>>>>> is
>>>>>>>>> sth. worth itself --- such that one have to either turn on all
>>>>> DEBUG
>>>>>>>>> metrics recording or none of them. If we can allow users to e.g.
>>>>>>> specify
>>>>>>>>> "turn on streams-metrics and stream-state-metrics, but not others"
>>>>>> and
>>>>>>> then
>>>>>>>>> I think it should be just at store-level. However, right now if we
>>>>>>> want to
>>>>>>>>> set it as store-level then it would be DEBUG and not exposed by
>>>>>>> default.
>>>>>>>>>
>>>>>>>>> So it seems we have several options in addition to the proposed
>>>>> one:
>>>>>>>>>
>>>>>>>>> a) we set it at store-level as INFO; but then one can argue why
>>>>> this
>>>>>> is
>>>>>>>>> INFO while others (bytes-written, etc) are DEBUG.
>>>>>>>>> b) we set it at store-level as DEBUG, believing that we do not
>>>>>> usually
>>>>>>> need
>>>>>>>>> to turn it on.
>>>>>>>>> c) maybe, we can set it at task-level (? I'm not so sure myself
>>>>> about
>>>>>>>>> this.. :P) as INFO.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Guozhang
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, May 11, 2020 at 12:29 PM Sophie Blee-Goldman <
>>>>>>> sophie@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Bruno,
>>>>>>>>>>
>>>>>>>>>> Thanks for the KIP! I have one high-level concern, which is that
>>>>> we
>>>>>>> should
>>>>>>>>>> consider
>>>>>>>>>> reporting these metrics on the per-store level rather than
>>>>>>> instance-wide. I
>>>>>>>>>> know I was
>>>>>>>>>> the one who first proposed making it instance-wide, so bear with
>>>>>> me:
>>>>>>>>>>
>>>>>>>>>> While I would still argue that the instance-wide memory usage is
>>>>>>> probably
>>>>>>>>>> the most *useful*,
>>>>>>>>>> exposing them at the store-level does not prevent users from
>>>>>>> monitoring the
>>>>>>>>>> instance-wide
>>>>>>>>>> memory. They should be able to roll up all the store-level
>>>>> metrics
>>>>>>> on an
>>>>>>>>>> instance to
>>>>>>>>>> compute the total off-heap memory. But rolling it up for the
>>>>> users
>>>>>>> does
>>>>>>>>>> prevent them from
>>>>>>>>>> using this to debug rare cases where one store may be using
>>>>>>> significantly
>>>>>>>>>> more memory than
>>>>>>>>>> expected.
>>>>>>>>>>
>>>>>>>>>> It's also worth considering that some users may be using the
>>>>>> bounded
>>>>>>> memory
>>>>>>>>>> config setter
>>>>>>>>>> to put a cap on the off-heap memory of the entire process, in
>>>>> which
>>>>>>> case
>>>>>>>>>> the memory usage
>>>>>>>>>> metric for any one store should reflect the memory usage of the
>>>>>>> entire
>>>>>>>>>> instance. In that case
>>>>>>>>>> any effort to roll up the memory usages ourselves would just be
>>>>>>> wasted.
>>>>>>>>>>
>>>>>>>>>> Sorry for the reversal, but after a second thought I'm pretty
>>>>>>> strongly in
>>>>>>>>>> favor of reporting these
>>>>>>>>>> at the store level.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Sophie
>>>>>>>>>>
>>>>>>>>>> On Wed, May 6, 2020 at 8:41 AM Bruno Cadonna <bruno@confluent.io
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I'd like to discuss KIP-607 that aims to add RocksDB memory
>>>>> usage
>>>>>>>>>>> metrics to Kafka Streams.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Record+the+Memory+Used+by+RocksDB+to+Kafka+Streams
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Bruno
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> -- Guozhang
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -- Guozhang
>>>>>
>>>>
>>>
>>
>>
>> -- 
>> -- Guozhang
>>

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

Posted by John Roesler <vv...@apache.org>.

Thanks for the update, Bruno!

In addition to Guozhang's feedback, I'm a little concerned
about the change to the RocksDBConfigSetter. If I understand
the proposal, people would have to separately supply
their Cache to the Options parameter in setConfig() and also
save it in a field so it can be returned in cache(). If they don't
return the same object, then the metrics won't be accurate,
but otherwise the mistake will be undetectable. Also, the
method is defaulted to return null, so existing implementations
would have no indication that they need to change, except that
users who want to read the new metrics would see inaccurate
values. They probably don't have a priori knowledge that would
let them identify that the reported metrics aren't accurate, and
even if they do notice something is wrong, it would probably take
quite a bit of effort to get all the way to the root cause.

I'm wondering if we can instead avoid the new method and pass
to the ConfigSetter our own subclass of Options (which is non-final 
and has only public constructors) that would enable us to capture
and retrieve the Cache later. Or even just use reflection to get the
Cache out of the existing Options object after calling the ConfigSetter.

What do you think?
Thanks again for the update,
-John

On Mon, Jul 20, 2020, at 17:39, Guozhang Wang wrote:
> Hello Bruno,
> 
> Thanks for the updated KIP. I made a pass and here are some comments:
> 
> 1) What's the motivation of keeping it as INFO while KIP-471 metrics are
> defined in DEBUG?
> 
> 2) Some namings are a bit inconsistent with others and with KIP-471, for
> example:
> 
> 2.a) KIP-471 uses "memtable" while in this KIP we use "mem-table", also the
> "memtable" is prefixed and then the metric name. I'd suggest we keep them
> consistent. e.g. "num-immutable-mem-table" => "immutable-memtable-count",
> "cur-size-active-mem-table" => "active-memable-bytes"
> 
> 2.b) "immutable" are abbreviated as "imm" in some names but not in others,
> I'd suggest we do not use abbreviations across all names,
> e.g. "num-entries-imm-mem-tables" => "immutable-memtable-num-entries".
> 
> 2.c) "-size" "-num" semantics is usually a bit unclear, and I'd suggest we
> just more concrete terms, e.g. "total-sst-files-size" =>
> "total-sst-files-bytes", "num-live-versions" => "live-versions-count",
> "background-errors" => "background-errors-count".
> 
> 3) Some metrics are a bit confusing, e.g.
> 
> 3.a) What's the difference between "cur-size-all-mem-tables" and
> "size-all-mem-tables"?
> 
> 3.b) And the explanation of "estimate-table-readers-mem" does not read very
> clear to me either, does it refer to "estimate-sst-file-read-buffer-bytes"?
> 
> 3.c) How does "estimate-oldest-key-time" help with memory usage debugging?
> 
> 4) For my own education, does "estimate-pending-compaction-bytes" capture
> all the memory usage for compaction buffers?
> 
> 5) This is just of a nit comment to help readers better understand rocksDB:
> maybe we can explain in the wiki doc which part of rocksDB uses memory
> (block cache, OS cache, memtable, compaction buffer, read buffer), and
> which of them are on-heap and wich of them are off-heap, which can be hard
> bounded and which can only be soft bounded and which cannot be bounded at
> all, etc.
> 
> 
> Guozhang
> 
> 
> On Mon, Jul 20, 2020 at 11:00 AM Bruno Cadonna <br...@confluent.io> wrote:
> 
> > Hi,
> >
> > During the implementation of this KIP and after some discussion about
> > RocksDB metrics, I decided to make some major modifications to this KIP
> > and kick off discussion again.
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB
> >
> > Best,
> > Bruno
> >
> > On 15.05.20 17:11, Bill Bejeck wrote:
> > > Thanks for the KIP, Bruno. Having sensible, easy to access RocksDB memory
> > > reporting will be a welcomed addition.
> > >
> > > FWIW I also agree to have the metrics reported on a store level. I'm glad
> > > you changed the KIP to that effect.
> > >
> > > -Bill
> > >
> > >
> > >
> > > On Wed, May 13, 2020 at 6:24 PM Guozhang Wang <wa...@gmail.com>
> > wrote:
> > >
> > >> Hi Bruno,
> > >>
> > >> Sounds good to me.
> > >>
> > >> I think I'm just a bit more curious to see its impact on performance: as
> > >> long as we have one INFO level rocksDB metrics, then we'd have to turn
> > on
> > >> the scheduled rocksdb metrics recorder whereas previously, we can
> > decide to
> > >> not turn on the recorder at all if all are set as DEBUG and we
> > configure at
> > >> INFO level in production. But this is an implementation detail anyways
> > and
> > >> maybe the impact is negligible after all. We can check and re-discuss
> > this
> > >> afterwards :)
> > >>
> > >>
> > >> Guozhang
> > >>
> > >>
> > >> On Wed, May 13, 2020 at 9:34 AM Sophie Blee-Goldman <
> > sophie@confluent.io>
> > >> wrote:
> > >>
> > >>> Thanks Bruno! I took a look at the revised KIP and it looks good to me.
> > >>>
> > >>> Sophie
> > >>>
> > >>> On Wed, May 13, 2020 at 6:59 AM Bruno Cadonna <br...@confluent.io>
> > >> wrote:
> > >>>
> > >>>> Hi John,
> > >>>>
> > >>>> Thank you for the feedback!
> > >>>>
> > >>>> I agree and I will change the KIP as I stated in my previous e-mail to
> > >>>> Guozhang.
> > >>>>
> > >>>> Best,
> > >>>> Bruno
> > >>>>
> > >>>> On Tue, May 12, 2020 at 3:07 AM John Roesler <vv...@apache.org>
> > >>> wrote:
> > >>>>>
> > >>>>> Thanks, all.
> > >>>>>
> > >>>>> If you don’t mind, I’ll pitch in a few cents’ worth.
> > >>>>>
> > >>>>> In my life I’ve generally found more granular metrics to be more
> > >>> useful,
> > >>>> as long as there’s a sane way to roll them up. It does seem nice to
> > see
> > >>> it
> > >>>> on the per-store level. For roll-up purposes, the task and thread tags
> > >>>> should be sufficient.
> > >>>>>
> > >>>>> I think the only reason we make some metrics Debug is that
> > >> _recording_
> > >>>> them can be expensive. If there’s no added expense, I think we can
> > just
> > >>>> register store-level metrics at Info level.
> > >>>>>
> > >>>>> Thanks for the KIP, Bruno!
> > >>>>> -John
> > >>>>>
> > >>>>> On Mon, May 11, 2020, at 17:32, Guozhang Wang wrote:
> > >>>>>> Hello Sophie / Bruno,
> > >>>>>>
> > >>>>>> I've also thought about the leveling question, and one motivation I
> > >>>> had for
> > >>>>>> setting it in instance-level is that we want to expose it in INFO
> > >>>> level:
> > >>>>>> today our report leveling is not very finer grained --- which I
> > >> think
> > >>>> is
> > >>>>>> sth. worth itself --- such that one have to either turn on all
> > >> DEBUG
> > >>>>>> metrics recording or none of them. If we can allow users to e.g.
> > >>>> specify
> > >>>>>> "turn on streams-metrics and stream-state-metrics, but not others"
> > >>> and
> > >>>> then
> > >>>>>> I think it should be just at store-level. However, right now if we
> > >>>> want to
> > >>>>>> set it as store-level then it would be DEBUG and not exposed by
> > >>>> default.
> > >>>>>>
> > >>>>>> So it seems we have several options in addition to the proposed
> > >> one:
> > >>>>>>
> > >>>>>> a) we set it at store-level as INFO; but then one can argue why
> > >> this
> > >>> is
> > >>>>>> INFO while others (bytes-written, etc) are DEBUG.
> > >>>>>> b) we set it at store-level as DEBUG, believing that we do not
> > >>> usually
> > >>>> need
> > >>>>>> to turn it on.
> > >>>>>> c) maybe, we can set it at task-level (? I'm not so sure myself
> > >> about
> > >>>>>> this.. :P) as INFO.
> > >>>>>>
> > >>>>>>
> > >>>>>> Guozhang
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Mon, May 11, 2020 at 12:29 PM Sophie Blee-Goldman <
> > >>>> sophie@confluent.io>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hey Bruno,
> > >>>>>>>
> > >>>>>>> Thanks for the KIP! I have one high-level concern, which is that
> > >> we
> > >>>> should
> > >>>>>>> consider
> > >>>>>>> reporting these metrics on the per-store level rather than
> > >>>> instance-wide. I
> > >>>>>>> know I was
> > >>>>>>> the one who first proposed making it instance-wide, so bear with
> > >>> me:
> > >>>>>>>
> > >>>>>>> While I would still argue that the instance-wide memory usage is
> > >>>> probably
> > >>>>>>> the most *useful*,
> > >>>>>>> exposing them at the store-level does not prevent users from
> > >>>> monitoring the
> > >>>>>>> instance-wide
> > >>>>>>> memory. They should be able to roll up all the store-level
> > >> metrics
> > >>>> on an
> > >>>>>>> instance to
> > >>>>>>> compute the total off-heap memory. But rolling it up for the
> > >> users
> > >>>> does
> > >>>>>>> prevent them from
> > >>>>>>> using this to debug rare cases where one store may be using
> > >>>> significantly
> > >>>>>>> more memory than
> > >>>>>>> expected.
> > >>>>>>>
> > >>>>>>> It's also worth considering that some users may be using the
> > >>> bounded
> > >>>> memory
> > >>>>>>> config setter
> > >>>>>>> to put a cap on the off-heap memory of the entire process, in
> > >> which
> > >>>> case
> > >>>>>>> the memory usage
> > >>>>>>> metric for any one store should reflect the memory usage of the
> > >>>> entire
> > >>>>>>> instance. In that case
> > >>>>>>> any effort to roll up the memory usages ourselves would just be
> > >>>> wasted.
> > >>>>>>>
> > >>>>>>> Sorry for the reversal, but after a second thought I'm pretty
> > >>>> strongly in
> > >>>>>>> favor of reporting these
> > >>>>>>> at the store level.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Sophie
> > >>>>>>>
> > >>>>>>> On Wed, May 6, 2020 at 8:41 AM Bruno Cadonna <bruno@confluent.io
> > >>>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> I'd like to discuss KIP-607 that aims to add RocksDB memory
> > >> usage
> > >>>>>>>> metrics to Kafka Streams.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>
> > >>>
> > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Record+the+Memory+Used+by+RocksDB+to+Kafka+Streams
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Bruno
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> -- Guozhang
> > >>>>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
> 
> 
> -- 
> -- Guozhang
>

Re: [DISCUSS] KIP-607: Add Metrics to Record the Memory Used by RocksDB to Kafka Streams

Posted by Guozhang Wang <wa...@gmail.com>.

Hello Bruno,

Thanks for the updated KIP. I made a pass and here are some comments:

1) What's the motivation of keeping it as INFO while KIP-471 metrics are
defined in DEBUG?

2) Some namings are a bit inconsistent with others and with KIP-471, for
example:

2.a) KIP-471 uses "memtable" while in this KIP we use "mem-table", also the
"memtable" is prefixed and then the metric name. I'd suggest we keep them
consistent. e.g. "num-immutable-mem-table" => "immutable-memtable-count",
"cur-size-active-mem-table" => "active-memable-bytes"

2.b) "immutable" are abbreviated as "imm" in some names but not in others,
I'd suggest we do not use abbreviations across all names,
e.g. "num-entries-imm-mem-tables" => "immutable-memtable-num-entries".

2.c) "-size" "-num" semantics is usually a bit unclear, and I'd suggest we
just more concrete terms, e.g. "total-sst-files-size" =>
"total-sst-files-bytes", "num-live-versions" => "live-versions-count",
"background-errors" => "background-errors-count".

3) Some metrics are a bit confusing, e.g.

3.a) What's the difference between "cur-size-all-mem-tables" and
"size-all-mem-tables"?

3.b) And the explanation of "estimate-table-readers-mem" does not read very
clear to me either, does it refer to "estimate-sst-file-read-buffer-bytes"?

3.c) How does "estimate-oldest-key-time" help with memory usage debugging?

4) For my own education, does "estimate-pending-compaction-bytes" capture
all the memory usage for compaction buffers?

5) This is just of a nit comment to help readers better understand rocksDB:
maybe we can explain in the wiki doc which part of rocksDB uses memory
(block cache, OS cache, memtable, compaction buffer, read buffer), and
which of them are on-heap and wich of them are off-heap, which can be hard
bounded and which can only be soft bounded and which cannot be bounded at
all, etc.


Guozhang


On Mon, Jul 20, 2020 at 11:00 AM Bruno Cadonna <br...@confluent.io> wrote:

> Hi,
>
> During the implementation of this KIP and after some discussion about
> RocksDB metrics, I decided to make some major modifications to this KIP
> and kick off discussion again.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Kafka+Streams+to+Report+Properties+of+RocksDB
>
> Best,
> Bruno
>
> On 15.05.20 17:11, Bill Bejeck wrote:
> > Thanks for the KIP, Bruno. Having sensible, easy to access RocksDB memory
> > reporting will be a welcomed addition.
> >
> > FWIW I also agree to have the metrics reported on a store level. I'm glad
> > you changed the KIP to that effect.
> >
> > -Bill
> >
> >
> >
> > On Wed, May 13, 2020 at 6:24 PM Guozhang Wang <wa...@gmail.com>
> wrote:
> >
> >> Hi Bruno,
> >>
> >> Sounds good to me.
> >>
> >> I think I'm just a bit more curious to see its impact on performance: as
> >> long as we have one INFO level rocksDB metrics, then we'd have to turn
> on
> >> the scheduled rocksdb metrics recorder whereas previously, we can
> decide to
> >> not turn on the recorder at all if all are set as DEBUG and we
> configure at
> >> INFO level in production. But this is an implementation detail anyways
> and
> >> maybe the impact is negligible after all. We can check and re-discuss
> this
> >> afterwards :)
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Wed, May 13, 2020 at 9:34 AM Sophie Blee-Goldman <
> sophie@confluent.io>
> >> wrote:
> >>
> >>> Thanks Bruno! I took a look at the revised KIP and it looks good to me.
> >>>
> >>> Sophie
> >>>
> >>> On Wed, May 13, 2020 at 6:59 AM Bruno Cadonna <br...@confluent.io>
> >> wrote:
> >>>
> >>>> Hi John,
> >>>>
> >>>> Thank you for the feedback!
> >>>>
> >>>> I agree and I will change the KIP as I stated in my previous e-mail to
> >>>> Guozhang.
> >>>>
> >>>> Best,
> >>>> Bruno
> >>>>
> >>>> On Tue, May 12, 2020 at 3:07 AM John Roesler <vv...@apache.org>
> >>> wrote:
> >>>>>
> >>>>> Thanks, all.
> >>>>>
> >>>>> If you don’t mind, I’ll pitch in a few cents’ worth.
> >>>>>
> >>>>> In my life I’ve generally found more granular metrics to be more
> >>> useful,
> >>>> as long as there’s a sane way to roll them up. It does seem nice to
> see
> >>> it
> >>>> on the per-store level. For roll-up purposes, the task and thread tags
> >>>> should be sufficient.
> >>>>>
> >>>>> I think the only reason we make some metrics Debug is that
> >> _recording_
> >>>> them can be expensive. If there’s no added expense, I think we can
> just
> >>>> register store-level metrics at Info level.
> >>>>>
> >>>>> Thanks for the KIP, Bruno!
> >>>>> -John
> >>>>>
> >>>>> On Mon, May 11, 2020, at 17:32, Guozhang Wang wrote:
> >>>>>> Hello Sophie / Bruno,
> >>>>>>
> >>>>>> I've also thought about the leveling question, and one motivation I
> >>>> had for
> >>>>>> setting it in instance-level is that we want to expose it in INFO
> >>>> level:
> >>>>>> today our report leveling is not very finer grained --- which I
> >> think
> >>>> is
> >>>>>> sth. worth itself --- such that one have to either turn on all
> >> DEBUG
> >>>>>> metrics recording or none of them. If we can allow users to e.g.
> >>>> specify
> >>>>>> "turn on streams-metrics and stream-state-metrics, but not others"
> >>> and
> >>>> then
> >>>>>> I think it should be just at store-level. However, right now if we
> >>>> want to
> >>>>>> set it as store-level then it would be DEBUG and not exposed by
> >>>> default.
> >>>>>>
> >>>>>> So it seems we have several options in addition to the proposed
> >> one:
> >>>>>>
> >>>>>> a) we set it at store-level as INFO; but then one can argue why
> >> this
> >>> is
> >>>>>> INFO while others (bytes-written, etc) are DEBUG.
> >>>>>> b) we set it at store-level as DEBUG, believing that we do not
> >>> usually
> >>>> need
> >>>>>> to turn it on.
> >>>>>> c) maybe, we can set it at task-level (? I'm not so sure myself
> >> about
> >>>>>> this.. :P) as INFO.
> >>>>>>
> >>>>>>
> >>>>>> Guozhang
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Mon, May 11, 2020 at 12:29 PM Sophie Blee-Goldman <
> >>>> sophie@confluent.io>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hey Bruno,
> >>>>>>>
> >>>>>>> Thanks for the KIP! I have one high-level concern, which is that
> >> we
> >>>> should
> >>>>>>> consider
> >>>>>>> reporting these metrics on the per-store level rather than
> >>>> instance-wide. I
> >>>>>>> know I was
> >>>>>>> the one who first proposed making it instance-wide, so bear with
> >>> me:
> >>>>>>>
> >>>>>>> While I would still argue that the instance-wide memory usage is
> >>>> probably
> >>>>>>> the most *useful*,
> >>>>>>> exposing them at the store-level does not prevent users from
> >>>> monitoring the
> >>>>>>> instance-wide
> >>>>>>> memory. They should be able to roll up all the store-level
> >> metrics
> >>>> on an
> >>>>>>> instance to
> >>>>>>> compute the total off-heap memory. But rolling it up for the
> >> users
> >>>> does
> >>>>>>> prevent them from
> >>>>>>> using this to debug rare cases where one store may be using
> >>>> significantly
> >>>>>>> more memory than
> >>>>>>> expected.
> >>>>>>>
> >>>>>>> It's also worth considering that some users may be using the
> >>> bounded
> >>>> memory
> >>>>>>> config setter
> >>>>>>> to put a cap on the off-heap memory of the entire process, in
> >> which
> >>>> case
> >>>>>>> the memory usage
> >>>>>>> metric for any one store should reflect the memory usage of the
> >>>> entire
> >>>>>>> instance. In that case
> >>>>>>> any effort to roll up the memory usages ourselves would just be
> >>>> wasted.
> >>>>>>>
> >>>>>>> Sorry for the reversal, but after a second thought I'm pretty
> >>>> strongly in
> >>>>>>> favor of reporting these
> >>>>>>> at the store level.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Sophie
> >>>>>>>
> >>>>>>> On Wed, May 6, 2020 at 8:41 AM Bruno Cadonna <bruno@confluent.io
> >>>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I'd like to discuss KIP-607 that aims to add RocksDB memory
> >> usage
> >>>>>>>> metrics to Kafka Streams.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-607%3A+Add+Metrics+to+Record+the+Memory+Used+by+RocksDB+to+Kafka+Streams
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Bruno
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> -- Guozhang
> >>>>>>
> >>>>
> >>>
> >>
> >>
> >> --
> >> -- Guozhang
> >>
> >
>


-- 
-- Guozhang