You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Ivan Rakov <iv...@gmail.com> on 2020/08/10 16:11:18 UTC

Re: [DISCUSSION] Add index rebuild time metrics

Folks,

Sorry for coming late to the party. I've taken a look at this issue during
review.

How can a local number of processed keys can help us to understand when
index rebuild will be finished?
We can't compare metric value with cache.size(). First one is node-local,
while cache size covers all partitions in the cluster.
Also, I don't understand why we need to keep separate metrics for all
caches. Of course, the metric becomes more fair, but obviously harder to
make conclusions on whether "the index rebuild" process is over (and the
cluster is ready to process queries quickly).

I find one single metric much more usable. It would be perfect if metric
value is represented in percentage, e.g. current progress of local node
index rebuild is 60%.

--
Best regards,
Ivan

On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <st...@gmail.com>
wrote:

> Got it. I thought that index building and index rebuilding are essentially
> the same,
> but now I see that they are different: index rebuilding cares about all
> indexes at once while index building cares about particular ones.
>
> Kirill's approach sounds good.
>
> Stan
>
> > On 20 Jul 2020, at 14:54, Alexey Goncharuk <al...@gmail.com>
> wrote:
> >
> > Stan,
> >
> > Currently we never build indexes one-by-one - we always use a cache data
> > row visitor which either updates all indexes (see
> IndexRebuildFullClosure)
> > or updates a set of all indexes that need to catch up (see
> > IndexRebuildPartialClosure). GIven that, I do not see any need for
> > per-index rebuild status as this status will be updated for all outdated
> > indexes simultaneously.
> >
> > Kirill's approach for the total number of processed keys per cache seems
> > reasonable to me.
> >
> > --AG
> >
> > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tk...@yandex.ru>:
> >
> >> Hi, Stan!
> >>
> >> Perhaps it is worth clarifying what exactly I wanted to say.
> >> Now we have 2 processes: building and rebuilding indexes.
> >>
> >> At moment, we have some metrics for rebuilding indexes:
> >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
> >>
> >> I suggest adding another metric "Indexrebuildkeyprocessed", which will
> >> allow you to determine how many records are left to rebuild for cache.
> >>
> >> I think your comments are more about building an index that may need
> more
> >> metrics, but I think you should do it in a separate ticket.
> >>
> >> 03.07.2020, 03:09, "Stanislav Lukyanov" <st...@gmail.com>:
> >>> If multiple indexes are to be built "number of indexed keys" metric may
> >> be misleading.
> >>>
> >>> As a cluster admin, I'd like to know:
> >>> - Are all indexes ready on a node?
> >>> - How many indexes are to be built?
> >>> - How much resources are used by the index building (how many threads
> >> are used)?
> >>> - Which index(es?) is being built right now?
> >>> - How much time until the current (single) index building finishes?
> Here
> >> "time" can be a lot of things: partitions, entries, percent of the
> cache,
> >> minutes and hours
> >>> - How much time until all indexes are built?
> >>> - How much does it take to build each of my indexes / a single index of
> >> my cache on average?
> >>>
> >>> I think we need a set of metrics and/or log messages to solve all of
> >> these questions.
> >>> I imaging something like:
> >>> - numberOfIndexesToBuild
> >>> - a standard set of metrics on the index building thread pool (do we
> >> already have it?)
> >>> - currentlyBuiltIndexName (assuming we only build one at a time which
> is
> >> probably not true)
> >>> - for the "time" metrics I think percentage might be the best as it's
> >> the easiest to understand; we may add multiple metrics though.
> >>> - For "time per each index" I'd add detailed log messages stating how
> >> long did it take to build a particular index
> >>>
> >>> Thanks,
> >>> Stan
> >>>
> >>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <tk...@yandex.ru>
> >> wrote:
> >>>>
> >>>> Hi, Igniters.
> >>>>
> >>>> I would like to know if it is possible to estimate how much the index
> >> rebuild will take?
> >>>>
> >>>> At the moment, I have found the following metrics [1] and [2] and
> >> since the rebuild is based on caches, I think it would be useful to know
> >> how many records are processed in indexing. This way we can estimate how
> >> long we have to wait for the index to be rebuilt by subtracting [3] and
> how
> >> many records are indexed.
> >>>>
> >>>> I think we should add this metric [4].
> >>>>
> >>>> Comments, suggestions?
> >>>>
> >>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
> >>>> [2] -
> >>
> org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
> >>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
> >>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
> >>
>
>

Re: [DISCUSSION] Add index rebuild time metrics

Posted by Ivan Rakov <iv...@gmail.com>.

I seem to be in the minority here :)
Fine, let's make it as clear as possible which metric method
(localCacheSize) should be called in order to retrieve a 100% progress
milestone.
I've left comments in the PR.

On Tue, Aug 11, 2020 at 4:31 PM Nikolay Izhikov <ni...@apache.org> wrote:

> > I propose to stick with a cache-group level metric (e.g.
> getIndexBuildProgress)
>
> +1
>
> > that returns a float from 0 to 1, which is calculated as [processedKeys]
> / [localCacheSize].
>
> From my point of view, we shouldn’t do calculations on the Ignite side if
> we can avoid it.
> I’d rather provide two separate metrics - processedKeys and localCacheSize.
>
> > 11 авг. 2020 г., в 16:26, Ivan Rakov <iv...@gmail.com> написал(а):
> >
> >>
> >> As a compromise, I can add jmx methods (rebuilding indexes in the
> process
> >> and the percentage of rebuilding) for the entire node, but I tried to
> find
> >> a suitable place and did not find it, tell me where to add it?
> >
> > I have checked existing JMX beans. To be honest, I struggle to find a
> > suitable place as well.
> > We have ClusterMetrics that may represent the state of a local node, but
> > this class is also used for aggregated cluster metrics. I can't propose a
> > reasonable way to merge percentages from different nodes.
> > On the other hand, total index rebuild for all caches isn't a common
> > scenario. It's either performed after manual index.bin removal or after
> > index creation, both operations are performed on cache / cache-group
> level.
> > Also, all other similar metrics are provided on cache-group level.
> >
> > I propose to stick with a cache-group level metric (e.g.
> > getIndexBuildProgress) that returns a float from 0 to 1, which is
> > calculated as [processedKeys] / [localCacheSize]. Even if a user handles
> > metrics through Zabbix, I anticipate that he'll perform this calculation
> on
> > his own in order to estimate progress. Let's help him a bit and perform
> it
> > on the system side.
> > If a per-group percentage metric is present, I
> > think getIndexRebuildKeyProcessed becomes redundant.
> >
> > On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл <tk...@yandex.ru>
> > wrote:
> >
> >> Hi, Ivan!
> >>
> >> What precision would be sufficient?
> >>> If the progress is very slow, I don't see issues with tracking it if
> the
> >>> percentage float has enough precision.
> >>
> >> I think we can add a mention getting cache size.
> >>> 1. Gain an understanding that local cache size
> >>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> >>> isn't mentioned neither in javadoc nor in JMX method description).
> >>
> >> Do you think users collect metrics with their hands? I think this is
> done
> >> by other systems, such as zabbix.
> >>> 2. Manually calculate sum of all metrics and divide to sum of all cache
> >>> sizes.
> >>
> >> As a compromise, I can add jmx methods (rebuilding indexes in the
> process
> >> and the percentage of rebuilding) for the entire node, but I tried to
> find
> >> a suitable place and did not find it, tell me where to add it?
> >>> On the other hand, % of index rebuild progress is self-descriptive. I
> >> don't
> >>> understand why we tend to make user's life harder.
> >>
> >> 10.08.2020, 21:57, "Ivan Rakov" <iv...@gmail.com>:
> >>>> This metric can be used only for local node, to get size of cache use
> >>>>
> >>
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> >>>
> >>> Got it, agree.
> >>>
> >>> If there is a lot of data in node that can be rebuilt, percentage may
> >>>> change very rarely and may not give an estimate of how much time is
> >> left.
> >>>> If we see for example that 50_000 keys are rebuilt once a minute, and
> >> we
> >>>> have 1_000_000_000 keys, then we can have an approximate estimate.
> >> What do
> >>>> you think of that?
> >>>
> >>> If the progress is very slow, I don't see issues with tracking it if
> the
> >>> percentage float has enough precision.
> >>> Still, usability of the metric concerns me. In order to estimate
> >> remaining
> >>> time of index rebuild, user should:
> >>> 1. Gain an understanding that local cache size
> >>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> >>> isn't mentioned neither in javadoc nor in JMX method description).
> >>> 2. Manually calculate sum of all metrics and divide to sum of all cache
> >>> sizes.
> >>> On the other hand, % of index rebuild progress is self-descriptive. I
> >> don't
> >>> understand why we tend to make user's life harder.
> >>>
> >>> --
> >>> Best regards,
> >>> Ivan
> >>>
> >>> On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tk...@yandex.ru>
> >>> wrote:
> >>>
> >>>> Hi, Ivan!
> >>>>
> >>>> For this you can use
> >>>> org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
> >>>>> How can a local number of processed keys can help us to understand
> >> when
> >>>>> index rebuild will be finished?
> >>>>
> >>>> This metric can be used only for local node, to get size of cache use
> >>>>
> >>
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> >>>>> We can't compare metric value with cache.size(). First one is
> >> node-local,
> >>>>> while cache size covers all partitions in the cluster.
> >>>>
> >>>> If there is a lot of data in node that can be rebuilt, percentage may
> >>>> change very rarely and may not give an estimate of how much time is
> >> left.
> >>>> If we see for example that 50_000 keys are rebuilt once a minute, and
> >> we
> >>>> have 1_000_000_000 keys, then we can have an approximate estimate.
> >> What do
> >>>> you think of that?
> >>>>> I find one single metric much more usable. It would be perfect if
> >> metric
> >>>>> value is represented in percentage, e.g. current progress of local
> >> node
> >>>>> index rebuild is 60%.
> >>>>
> >>>> 10.08.2020, 19:11, "Ivan Rakov" <iv...@gmail.com>:
> >>>>> Folks,
> >>>>>
> >>>>> Sorry for coming late to the party. I've taken a look at this issue
> >>>> during
> >>>>> review.
> >>>>>
> >>>>> How can a local number of processed keys can help us to understand
> >> when
> >>>>> index rebuild will be finished?
> >>>>> We can't compare metric value with cache.size(). First one is
> >> node-local,
> >>>>> while cache size covers all partitions in the cluster.
> >>>>> Also, I don't understand why we need to keep separate metrics for all
> >>>>> caches. Of course, the metric becomes more fair, but obviously
> >> harder to
> >>>>> make conclusions on whether "the index rebuild" process is over (and
> >> the
> >>>>> cluster is ready to process queries quickly).
> >>>>>
> >>>>> I find one single metric much more usable. It would be perfect if
> >> metric
> >>>>> value is represented in percentage, e.g. current progress of local
> >> node
> >>>>> index rebuild is 60%.
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Ivan
> >>>>>
> >>>>> On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
> >>>> stanlukyanov@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Got it. I thought that index building and index rebuilding are
> >>>> essentially
> >>>>>> the same,
> >>>>>> but now I see that they are different: index rebuilding cares about
> >> all
> >>>>>> indexes at once while index building cares about particular ones.
> >>>>>>
> >>>>>> Kirill's approach sounds good.
> >>>>>>
> >>>>>> Stan
> >>>>>>
> >>>>>>> On 20 Jul 2020, at 14:54, Alexey Goncharuk <
> >>>> alexey.goncharuk@gmail.com>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> Stan,
> >>>>>>>
> >>>>>>> Currently we never build indexes one-by-one - we always use a
> >> cache
> >>>> data
> >>>>>>> row visitor which either updates all indexes (see
> >>>>>> IndexRebuildFullClosure)
> >>>>>>> or updates a set of all indexes that need to catch up (see
> >>>>>>> IndexRebuildPartialClosure). GIven that, I do not see any need for
> >>>>>>> per-index rebuild status as this status will be updated for all
> >>>> outdated
> >>>>>>> indexes simultaneously.
> >>>>>>>
> >>>>>>> Kirill's approach for the total number of processed keys per cache
> >>>> seems
> >>>>>>> reasonable to me.
> >>>>>>>
> >>>>>>> --AG
> >>>>>>>
> >>>>>>> пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tkalkirill@yandex.ru
> >>> :
> >>>>>>>
> >>>>>>>> Hi, Stan!
> >>>>>>>>
> >>>>>>>> Perhaps it is worth clarifying what exactly I wanted to say.
> >>>>>>>> Now we have 2 processes: building and rebuilding indexes.
> >>>>>>>>
> >>>>>>>> At moment, we have some metrics for rebuilding indexes:
> >>>>>>>> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
> >>>>>>>>
> >>>>>>>> I suggest adding another metric "Indexrebuildkeyprocessed", which
> >>>> will
> >>>>>>>> allow you to determine how many records are left to rebuild for
> >>>> cache.
> >>>>>>>>
> >>>>>>>> I think your comments are more about building an index that may
> >> need
> >>>>>> more
> >>>>>>>> metrics, but I think you should do it in a separate ticket.
> >>>>>>>>
> >>>>>>>> 03.07.2020, 03:09, "Stanislav Lukyanov" <stanlukyanov@gmail.com
> >>> :
> >>>>>>>>> If multiple indexes are to be built "number of indexed keys"
> >>>> metric may
> >>>>>>>> be misleading.
> >>>>>>>>>
> >>>>>>>>> As a cluster admin, I'd like to know:
> >>>>>>>>> - Are all indexes ready on a node?
> >>>>>>>>> - How many indexes are to be built?
> >>>>>>>>> - How much resources are used by the index building (how many
> >>>> threads
> >>>>>>>> are used)?
> >>>>>>>>> - Which index(es?) is being built right now?
> >>>>>>>>> - How much time until the current (single) index building
> >> finishes?
> >>>>>> Here
> >>>>>>>> "time" can be a lot of things: partitions, entries, percent of
> >> the
> >>>>>> cache,
> >>>>>>>> minutes and hours
> >>>>>>>>> - How much time until all indexes are built?
> >>>>>>>>> - How much does it take to build each of my indexes / a single
> >>>> index of
> >>>>>>>> my cache on average?
> >>>>>>>>>
> >>>>>>>>> I think we need a set of metrics and/or log messages to solve
> >> all
> >>>> of
> >>>>>>>> these questions.
> >>>>>>>>> I imaging something like:
> >>>>>>>>> - numberOfIndexesToBuild
> >>>>>>>>> - a standard set of metrics on the index building thread pool
> >> (do
> >>>> we
> >>>>>>>> already have it?)
> >>>>>>>>> - currentlyBuiltIndexName (assuming we only build one at a time
> >>>> which
> >>>>>> is
> >>>>>>>> probably not true)
> >>>>>>>>> - for the "time" metrics I think percentage might be the best as
> >>>> it's
> >>>>>>>> the easiest to understand; we may add multiple metrics though.
> >>>>>>>>> - For "time per each index" I'd add detailed log messages
> >> stating
> >>>> how
> >>>>>>>> long did it take to build a particular index
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Stan
> >>>>>>>>>
> >>>>>>>>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <
> >> tkalkirill@yandex.ru>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi, Igniters.
> >>>>>>>>>>
> >>>>>>>>>> I would like to know if it is possible to estimate how much the
> >>>> index
> >>>>>>>> rebuild will take?
> >>>>>>>>>>
> >>>>>>>>>> At the moment, I have found the following metrics [1] and [2]
> >> and
> >>>>>>>> since the rebuild is based on caches, I think it would be useful
> >> to
> >>>> know
> >>>>>>>> how many records are processed in indexing. This way we can
> >>>> estimate how
> >>>>>>>> long we have to wait for the index to be rebuilt by subtracting
> >> [3]
> >>>> and
> >>>>>> how
> >>>>>>>> many records are indexed.
> >>>>>>>>>>
> >>>>>>>>>> I think we should add this metric [4].
> >>>>>>>>>>
> >>>>>>>>>> Comments, suggestions?
> >>>>>>>>>>
> >>>>>>>>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
> >>>>>>>>>> [2] -
> >>>>>>>>
> >>>>>>
> >>>>
> >>
> org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
> >>>>>>>>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
> >>>>>>>>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
> >>>>>>>>
> >>
>
>

Re: [DISCUSSION] Add index rebuild time metrics

Posted by Nikolay Izhikov <ni...@apache.org>.

> I propose to stick with a cache-group level metric (e.g. getIndexBuildProgress) 

+1

> that returns a float from 0 to 1, which is calculated as [processedKeys] / [localCacheSize].

From my point of view, we shouldn’t do calculations on the Ignite side if we can avoid it.
I’d rather provide two separate metrics - processedKeys and localCacheSize.

> 11 авг. 2020 г., в 16:26, Ivan Rakov <iv...@gmail.com> написал(а):
> 
>> 
>> As a compromise, I can add jmx methods (rebuilding indexes in the process
>> and the percentage of rebuilding) for the entire node, but I tried to find
>> a suitable place and did not find it, tell me where to add it?
> 
> I have checked existing JMX beans. To be honest, I struggle to find a
> suitable place as well.
> We have ClusterMetrics that may represent the state of a local node, but
> this class is also used for aggregated cluster metrics. I can't propose a
> reasonable way to merge percentages from different nodes.
> On the other hand, total index rebuild for all caches isn't a common
> scenario. It's either performed after manual index.bin removal or after
> index creation, both operations are performed on cache / cache-group level.
> Also, all other similar metrics are provided on cache-group level.
> 
> I propose to stick with a cache-group level metric (e.g.
> getIndexBuildProgress) that returns a float from 0 to 1, which is
> calculated as [processedKeys] / [localCacheSize]. Even if a user handles
> metrics through Zabbix, I anticipate that he'll perform this calculation on
> his own in order to estimate progress. Let's help him a bit and perform it
> on the system side.
> If a per-group percentage metric is present, I
> think getIndexRebuildKeyProcessed becomes redundant.
> 
> On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл <tk...@yandex.ru>
> wrote:
> 
>> Hi, Ivan!
>> 
>> What precision would be sufficient?
>>> If the progress is very slow, I don't see issues with tracking it if the
>>> percentage float has enough precision.
>> 
>> I think we can add a mention getting cache size.
>>> 1. Gain an understanding that local cache size
>>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
>>> isn't mentioned neither in javadoc nor in JMX method description).
>> 
>> Do you think users collect metrics with their hands? I think this is done
>> by other systems, such as zabbix.
>>> 2. Manually calculate sum of all metrics and divide to sum of all cache
>>> sizes.
>> 
>> As a compromise, I can add jmx methods (rebuilding indexes in the process
>> and the percentage of rebuilding) for the entire node, but I tried to find
>> a suitable place and did not find it, tell me where to add it?
>>> On the other hand, % of index rebuild progress is self-descriptive. I
>> don't
>>> understand why we tend to make user's life harder.
>> 
>> 10.08.2020, 21:57, "Ivan Rakov" <iv...@gmail.com>:
>>>> This metric can be used only for local node, to get size of cache use
>>>> 
>> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>>> 
>>> Got it, agree.
>>> 
>>> If there is a lot of data in node that can be rebuilt, percentage may
>>>> change very rarely and may not give an estimate of how much time is
>> left.
>>>> If we see for example that 50_000 keys are rebuilt once a minute, and
>> we
>>>> have 1_000_000_000 keys, then we can have an approximate estimate.
>> What do
>>>> you think of that?
>>> 
>>> If the progress is very slow, I don't see issues with tracking it if the
>>> percentage float has enough precision.
>>> Still, usability of the metric concerns me. In order to estimate
>> remaining
>>> time of index rebuild, user should:
>>> 1. Gain an understanding that local cache size
>>> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
>>> isn't mentioned neither in javadoc nor in JMX method description).
>>> 2. Manually calculate sum of all metrics and divide to sum of all cache
>>> sizes.
>>> On the other hand, % of index rebuild progress is self-descriptive. I
>> don't
>>> understand why we tend to make user's life harder.
>>> 
>>> --
>>> Best regards,
>>> Ivan
>>> 
>>> On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tk...@yandex.ru>
>>> wrote:
>>> 
>>>> Hi, Ivan!
>>>> 
>>>> For this you can use
>>>> org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
>>>>> How can a local number of processed keys can help us to understand
>> when
>>>>> index rebuild will be finished?
>>>> 
>>>> This metric can be used only for local node, to get size of cache use
>>>> 
>> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>>>>> We can't compare metric value with cache.size(). First one is
>> node-local,
>>>>> while cache size covers all partitions in the cluster.
>>>> 
>>>> If there is a lot of data in node that can be rebuilt, percentage may
>>>> change very rarely and may not give an estimate of how much time is
>> left.
>>>> If we see for example that 50_000 keys are rebuilt once a minute, and
>> we
>>>> have 1_000_000_000 keys, then we can have an approximate estimate.
>> What do
>>>> you think of that?
>>>>> I find one single metric much more usable. It would be perfect if
>> metric
>>>>> value is represented in percentage, e.g. current progress of local
>> node
>>>>> index rebuild is 60%.
>>>> 
>>>> 10.08.2020, 19:11, "Ivan Rakov" <iv...@gmail.com>:
>>>>> Folks,
>>>>> 
>>>>> Sorry for coming late to the party. I've taken a look at this issue
>>>> during
>>>>> review.
>>>>> 
>>>>> How can a local number of processed keys can help us to understand
>> when
>>>>> index rebuild will be finished?
>>>>> We can't compare metric value with cache.size(). First one is
>> node-local,
>>>>> while cache size covers all partitions in the cluster.
>>>>> Also, I don't understand why we need to keep separate metrics for all
>>>>> caches. Of course, the metric becomes more fair, but obviously
>> harder to
>>>>> make conclusions on whether "the index rebuild" process is over (and
>> the
>>>>> cluster is ready to process queries quickly).
>>>>> 
>>>>> I find one single metric much more usable. It would be perfect if
>> metric
>>>>> value is represented in percentage, e.g. current progress of local
>> node
>>>>> index rebuild is 60%.
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Ivan
>>>>> 
>>>>> On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
>>>> stanlukyanov@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Got it. I thought that index building and index rebuilding are
>>>> essentially
>>>>>> the same,
>>>>>> but now I see that they are different: index rebuilding cares about
>> all
>>>>>> indexes at once while index building cares about particular ones.
>>>>>> 
>>>>>> Kirill's approach sounds good.
>>>>>> 
>>>>>> Stan
>>>>>> 
>>>>>>> On 20 Jul 2020, at 14:54, Alexey Goncharuk <
>>>> alexey.goncharuk@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Stan,
>>>>>>> 
>>>>>>> Currently we never build indexes one-by-one - we always use a
>> cache
>>>> data
>>>>>>> row visitor which either updates all indexes (see
>>>>>> IndexRebuildFullClosure)
>>>>>>> or updates a set of all indexes that need to catch up (see
>>>>>>> IndexRebuildPartialClosure). GIven that, I do not see any need for
>>>>>>> per-index rebuild status as this status will be updated for all
>>>> outdated
>>>>>>> indexes simultaneously.
>>>>>>> 
>>>>>>> Kirill's approach for the total number of processed keys per cache
>>>> seems
>>>>>>> reasonable to me.
>>>>>>> 
>>>>>>> --AG
>>>>>>> 
>>>>>>> пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tkalkirill@yandex.ru
>>> :
>>>>>>> 
>>>>>>>> Hi, Stan!
>>>>>>>> 
>>>>>>>> Perhaps it is worth clarifying what exactly I wanted to say.
>>>>>>>> Now we have 2 processes: building and rebuilding indexes.
>>>>>>>> 
>>>>>>>> At moment, we have some metrics for rebuilding indexes:
>>>>>>>> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
>>>>>>>> 
>>>>>>>> I suggest adding another metric "Indexrebuildkeyprocessed", which
>>>> will
>>>>>>>> allow you to determine how many records are left to rebuild for
>>>> cache.
>>>>>>>> 
>>>>>>>> I think your comments are more about building an index that may
>> need
>>>>>> more
>>>>>>>> metrics, but I think you should do it in a separate ticket.
>>>>>>>> 
>>>>>>>> 03.07.2020, 03:09, "Stanislav Lukyanov" <stanlukyanov@gmail.com
>>> :
>>>>>>>>> If multiple indexes are to be built "number of indexed keys"
>>>> metric may
>>>>>>>> be misleading.
>>>>>>>>> 
>>>>>>>>> As a cluster admin, I'd like to know:
>>>>>>>>> - Are all indexes ready on a node?
>>>>>>>>> - How many indexes are to be built?
>>>>>>>>> - How much resources are used by the index building (how many
>>>> threads
>>>>>>>> are used)?
>>>>>>>>> - Which index(es?) is being built right now?
>>>>>>>>> - How much time until the current (single) index building
>> finishes?
>>>>>> Here
>>>>>>>> "time" can be a lot of things: partitions, entries, percent of
>> the
>>>>>> cache,
>>>>>>>> minutes and hours
>>>>>>>>> - How much time until all indexes are built?
>>>>>>>>> - How much does it take to build each of my indexes / a single
>>>> index of
>>>>>>>> my cache on average?
>>>>>>>>> 
>>>>>>>>> I think we need a set of metrics and/or log messages to solve
>> all
>>>> of
>>>>>>>> these questions.
>>>>>>>>> I imaging something like:
>>>>>>>>> - numberOfIndexesToBuild
>>>>>>>>> - a standard set of metrics on the index building thread pool
>> (do
>>>> we
>>>>>>>> already have it?)
>>>>>>>>> - currentlyBuiltIndexName (assuming we only build one at a time
>>>> which
>>>>>> is
>>>>>>>> probably not true)
>>>>>>>>> - for the "time" metrics I think percentage might be the best as
>>>> it's
>>>>>>>> the easiest to understand; we may add multiple metrics though.
>>>>>>>>> - For "time per each index" I'd add detailed log messages
>> stating
>>>> how
>>>>>>>> long did it take to build a particular index
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Stan
>>>>>>>>> 
>>>>>>>>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <
>> tkalkirill@yandex.ru>
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi, Igniters.
>>>>>>>>>> 
>>>>>>>>>> I would like to know if it is possible to estimate how much the
>>>> index
>>>>>>>> rebuild will take?
>>>>>>>>>> 
>>>>>>>>>> At the moment, I have found the following metrics [1] and [2]
>> and
>>>>>>>> since the rebuild is based on caches, I think it would be useful
>> to
>>>> know
>>>>>>>> how many records are processed in indexing. This way we can
>>>> estimate how
>>>>>>>> long we have to wait for the index to be rebuilt by subtracting
>> [3]
>>>> and
>>>>>> how
>>>>>>>> many records are indexed.
>>>>>>>>>> 
>>>>>>>>>> I think we should add this metric [4].
>>>>>>>>>> 
>>>>>>>>>> Comments, suggestions?
>>>>>>>>>> 
>>>>>>>>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
>>>>>>>>>> [2] -
>>>>>>>> 
>>>>>> 
>>>> 
>>  org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
>>>>>>>>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
>>>>>>>>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
>>>>>>>> 
>>

Re: [DISCUSSION] Add index rebuild time metrics

Posted by Ivan Rakov <iv...@gmail.com>.

>
> As a compromise, I can add jmx methods (rebuilding indexes in the process
> and the percentage of rebuilding) for the entire node, but I tried to find
> a suitable place and did not find it, tell me where to add it?

I have checked existing JMX beans. To be honest, I struggle to find a
suitable place as well.
We have ClusterMetrics that may represent the state of a local node, but
this class is also used for aggregated cluster metrics. I can't propose a
reasonable way to merge percentages from different nodes.
On the other hand, total index rebuild for all caches isn't a common
scenario. It's either performed after manual index.bin removal or after
index creation, both operations are performed on cache / cache-group level.
Also, all other similar metrics are provided on cache-group level.

I propose to stick with a cache-group level metric (e.g.
getIndexBuildProgress) that returns a float from 0 to 1, which is
calculated as [processedKeys] / [localCacheSize]. Even if a user handles
metrics through Zabbix, I anticipate that he'll perform this calculation on
his own in order to estimate progress. Let's help him a bit and perform it
on the system side.
If a per-group percentage metric is present, I
think getIndexRebuildKeyProcessed becomes redundant.

On Tue, Aug 11, 2020 at 8:20 AM ткаленко кирилл <tk...@yandex.ru>
wrote:

> Hi, Ivan!
>
> What precision would be sufficient?
> > If the progress is very slow, I don't see issues with tracking it if the
> > percentage float has enough precision.
>
> I think we can add a mention getting cache size.
> > 1. Gain an understanding that local cache size
> > (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> > isn't mentioned neither in javadoc nor in JMX method description).
>
> Do you think users collect metrics with their hands? I think this is done
> by other systems, such as zabbix.
> > 2. Manually calculate sum of all metrics and divide to sum of all cache
> > sizes.
>
> As a compromise, I can add jmx methods (rebuilding indexes in the process
> and the percentage of rebuilding) for the entire node, but I tried to find
> a suitable place and did not find it, tell me where to add it?
> > On the other hand, % of index rebuild progress is self-descriptive. I
> don't
> > understand why we tend to make user's life harder.
>
> 10.08.2020, 21:57, "Ivan Rakov" <iv...@gmail.com>:
> >>  This metric can be used only for local node, to get size of cache use
> >>
>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> >
> >  Got it, agree.
> >
> > If there is a lot of data in node that can be rebuilt, percentage may
> >>  change very rarely and may not give an estimate of how much time is
> left.
> >>  If we see for example that 50_000 keys are rebuilt once a minute, and
> we
> >>  have 1_000_000_000 keys, then we can have an approximate estimate.
> What do
> >>  you think of that?
> >
> > If the progress is very slow, I don't see issues with tracking it if the
> > percentage float has enough precision.
> > Still, usability of the metric concerns me. In order to estimate
> remaining
> > time of index rebuild, user should:
> > 1. Gain an understanding that local cache size
> > (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> > isn't mentioned neither in javadoc nor in JMX method description).
> > 2. Manually calculate sum of all metrics and divide to sum of all cache
> > sizes.
> > On the other hand, % of index rebuild progress is self-descriptive. I
> don't
> > understand why we tend to make user's life harder.
> >
> > --
> > Best regards,
> > Ivan
> >
> > On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tk...@yandex.ru>
> > wrote:
> >
> >>  Hi, Ivan!
> >>
> >>  For this you can use
> >>  org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
> >>  > How can a local number of processed keys can help us to understand
> when
> >>  > index rebuild will be finished?
> >>
> >>  This metric can be used only for local node, to get size of cache use
> >>
>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> >>  > We can't compare metric value with cache.size(). First one is
> node-local,
> >>  > while cache size covers all partitions in the cluster.
> >>
> >>  If there is a lot of data in node that can be rebuilt, percentage may
> >>  change very rarely and may not give an estimate of how much time is
> left.
> >>  If we see for example that 50_000 keys are rebuilt once a minute, and
> we
> >>  have 1_000_000_000 keys, then we can have an approximate estimate.
> What do
> >>  you think of that?
> >>  > I find one single metric much more usable. It would be perfect if
> metric
> >>  > value is represented in percentage, e.g. current progress of local
> node
> >>  > index rebuild is 60%.
> >>
> >>  10.08.2020, 19:11, "Ivan Rakov" <iv...@gmail.com>:
> >>  > Folks,
> >>  >
> >>  > Sorry for coming late to the party. I've taken a look at this issue
> >>  during
> >>  > review.
> >>  >
> >>  > How can a local number of processed keys can help us to understand
> when
> >>  > index rebuild will be finished?
> >>  > We can't compare metric value with cache.size(). First one is
> node-local,
> >>  > while cache size covers all partitions in the cluster.
> >>  > Also, I don't understand why we need to keep separate metrics for all
> >>  > caches. Of course, the metric becomes more fair, but obviously
> harder to
> >>  > make conclusions on whether "the index rebuild" process is over (and
> the
> >>  > cluster is ready to process queries quickly).
> >>  >
> >>  > I find one single metric much more usable. It would be perfect if
> metric
> >>  > value is represented in percentage, e.g. current progress of local
> node
> >>  > index rebuild is 60%.
> >>  >
> >>  > --
> >>  > Best regards,
> >>  > Ivan
> >>  >
> >>  > On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
> >>  stanlukyanov@gmail.com>
> >>  > wrote:
> >>  >
> >>  >> Got it. I thought that index building and index rebuilding are
> >>  essentially
> >>  >> the same,
> >>  >> but now I see that they are different: index rebuilding cares about
> all
> >>  >> indexes at once while index building cares about particular ones.
> >>  >>
> >>  >> Kirill's approach sounds good.
> >>  >>
> >>  >> Stan
> >>  >>
> >>  >> > On 20 Jul 2020, at 14:54, Alexey Goncharuk <
> >>  alexey.goncharuk@gmail.com>
> >>  >> wrote:
> >>  >> >
> >>  >> > Stan,
> >>  >> >
> >>  >> > Currently we never build indexes one-by-one - we always use a
> cache
> >>  data
> >>  >> > row visitor which either updates all indexes (see
> >>  >> IndexRebuildFullClosure)
> >>  >> > or updates a set of all indexes that need to catch up (see
> >>  >> > IndexRebuildPartialClosure). GIven that, I do not see any need for
> >>  >> > per-index rebuild status as this status will be updated for all
> >>  outdated
> >>  >> > indexes simultaneously.
> >>  >> >
> >>  >> > Kirill's approach for the total number of processed keys per cache
> >>  seems
> >>  >> > reasonable to me.
> >>  >> >
> >>  >> > --AG
> >>  >> >
> >>  >> > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tkalkirill@yandex.ru
> >:
> >>  >> >
> >>  >> >> Hi, Stan!
> >>  >> >>
> >>  >> >> Perhaps it is worth clarifying what exactly I wanted to say.
> >>  >> >> Now we have 2 processes: building and rebuilding indexes.
> >>  >> >>
> >>  >> >> At moment, we have some metrics for rebuilding indexes:
> >>  >> >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
> >>  >> >>
> >>  >> >> I suggest adding another metric "Indexrebuildkeyprocessed", which
> >>  will
> >>  >> >> allow you to determine how many records are left to rebuild for
> >>  cache.
> >>  >> >>
> >>  >> >> I think your comments are more about building an index that may
> need
> >>  >> more
> >>  >> >> metrics, but I think you should do it in a separate ticket.
> >>  >> >>
> >>  >> >> 03.07.2020, 03:09, "Stanislav Lukyanov" <stanlukyanov@gmail.com
> >:
> >>  >> >>> If multiple indexes are to be built "number of indexed keys"
> >>  metric may
> >>  >> >> be misleading.
> >>  >> >>>
> >>  >> >>> As a cluster admin, I'd like to know:
> >>  >> >>> - Are all indexes ready on a node?
> >>  >> >>> - How many indexes are to be built?
> >>  >> >>> - How much resources are used by the index building (how many
> >>  threads
> >>  >> >> are used)?
> >>  >> >>> - Which index(es?) is being built right now?
> >>  >> >>> - How much time until the current (single) index building
> finishes?
> >>  >> Here
> >>  >> >> "time" can be a lot of things: partitions, entries, percent of
> the
> >>  >> cache,
> >>  >> >> minutes and hours
> >>  >> >>> - How much time until all indexes are built?
> >>  >> >>> - How much does it take to build each of my indexes / a single
> >>  index of
> >>  >> >> my cache on average?
> >>  >> >>>
> >>  >> >>> I think we need a set of metrics and/or log messages to solve
> all
> >>  of
> >>  >> >> these questions.
> >>  >> >>> I imaging something like:
> >>  >> >>> - numberOfIndexesToBuild
> >>  >> >>> - a standard set of metrics on the index building thread pool
> (do
> >>  we
> >>  >> >> already have it?)
> >>  >> >>> - currentlyBuiltIndexName (assuming we only build one at a time
> >>  which
> >>  >> is
> >>  >> >> probably not true)
> >>  >> >>> - for the "time" metrics I think percentage might be the best as
> >>  it's
> >>  >> >> the easiest to understand; we may add multiple metrics though.
> >>  >> >>> - For "time per each index" I'd add detailed log messages
> stating
> >>  how
> >>  >> >> long did it take to build a particular index
> >>  >> >>>
> >>  >> >>> Thanks,
> >>  >> >>> Stan
> >>  >> >>>
> >>  >> >>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <
> tkalkirill@yandex.ru>
> >>  >> >> wrote:
> >>  >> >>>>
> >>  >> >>>> Hi, Igniters.
> >>  >> >>>>
> >>  >> >>>> I would like to know if it is possible to estimate how much the
> >>  index
> >>  >> >> rebuild will take?
> >>  >> >>>>
> >>  >> >>>> At the moment, I have found the following metrics [1] and [2]
> and
> >>  >> >> since the rebuild is based on caches, I think it would be useful
> to
> >>  know
> >>  >> >> how many records are processed in indexing. This way we can
> >>  estimate how
> >>  >> >> long we have to wait for the index to be rebuilt by subtracting
> [3]
> >>  and
> >>  >> how
> >>  >> >> many records are indexed.
> >>  >> >>>>
> >>  >> >>>> I think we should add this metric [4].
> >>  >> >>>>
> >>  >> >>>> Comments, suggestions?
> >>  >> >>>>
> >>  >> >>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
> >>  >> >>>> [2] -
> >>  >> >>
> >>  >>
> >>
>   org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
> >>  >> >>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
> >>  >> >>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
> >>  >> >>
>

Re: [DISCUSSION] Add index rebuild time metrics

Posted by ткаленко кирилл <tk...@yandex.ru>.

Hi, Ivan!

What precision would be sufficient?
> If the progress is very slow, I don't see issues with tracking it if the
> percentage float has enough precision.

I think we can add a mention getting cache size.
> 1. Gain an understanding that local cache size
> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> isn't mentioned neither in javadoc nor in JMX method description).

Do you think users collect metrics with their hands? I think this is done by other systems, such as zabbix.
> 2. Manually calculate sum of all metrics and divide to sum of all cache
> sizes.

As a compromise, I can add jmx methods (rebuilding indexes in the process and the percentage of rebuilding) for the entire node, but I tried to find a suitable place and did not find it, tell me where to add it?
> On the other hand, % of index rebuild progress is self-descriptive. I don't
> understand why we tend to make user's life harder.

10.08.2020, 21:57, "Ivan Rakov" <iv...@gmail.com>:
>>  This metric can be used only for local node, to get size of cache use
>>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>
>  Got it, agree.
>
> If there is a lot of data in node that can be rebuilt, percentage may
>>  change very rarely and may not give an estimate of how much time is left.
>>  If we see for example that 50_000 keys are rebuilt once a minute, and we
>>  have 1_000_000_000 keys, then we can have an approximate estimate. What do
>>  you think of that?
>
> If the progress is very slow, I don't see issues with tracking it if the
> percentage float has enough precision.
> Still, usability of the metric concerns me. In order to estimate remaining
> time of index rebuild, user should:
> 1. Gain an understanding that local cache size
> (CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
> isn't mentioned neither in javadoc nor in JMX method description).
> 2. Manually calculate sum of all metrics and divide to sum of all cache
> sizes.
> On the other hand, % of index rebuild progress is self-descriptive. I don't
> understand why we tend to make user's life harder.
>
> --
> Best regards,
> Ivan
>
> On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tk...@yandex.ru>
> wrote:
>
>>  Hi, Ivan!
>>
>>  For this you can use
>>  org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
>>  > How can a local number of processed keys can help us to understand when
>>  > index rebuild will be finished?
>>
>>  This metric can be used only for local node, to get size of cache use
>>  org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
>>  > We can't compare metric value with cache.size(). First one is node-local,
>>  > while cache size covers all partitions in the cluster.
>>
>>  If there is a lot of data in node that can be rebuilt, percentage may
>>  change very rarely and may not give an estimate of how much time is left.
>>  If we see for example that 50_000 keys are rebuilt once a minute, and we
>>  have 1_000_000_000 keys, then we can have an approximate estimate. What do
>>  you think of that?
>>  > I find one single metric much more usable. It would be perfect if metric
>>  > value is represented in percentage, e.g. current progress of local node
>>  > index rebuild is 60%.
>>
>>  10.08.2020, 19:11, "Ivan Rakov" <iv...@gmail.com>:
>>  > Folks,
>>  >
>>  > Sorry for coming late to the party. I've taken a look at this issue
>>  during
>>  > review.
>>  >
>>  > How can a local number of processed keys can help us to understand when
>>  > index rebuild will be finished?
>>  > We can't compare metric value with cache.size(). First one is node-local,
>>  > while cache size covers all partitions in the cluster.
>>  > Also, I don't understand why we need to keep separate metrics for all
>>  > caches. Of course, the metric becomes more fair, but obviously harder to
>>  > make conclusions on whether "the index rebuild" process is over (and the
>>  > cluster is ready to process queries quickly).
>>  >
>>  > I find one single metric much more usable. It would be perfect if metric
>>  > value is represented in percentage, e.g. current progress of local node
>>  > index rebuild is 60%.
>>  >
>>  > --
>>  > Best regards,
>>  > Ivan
>>  >
>>  > On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
>>  stanlukyanov@gmail.com>
>>  > wrote:
>>  >
>>  >> Got it. I thought that index building and index rebuilding are
>>  essentially
>>  >> the same,
>>  >> but now I see that they are different: index rebuilding cares about all
>>  >> indexes at once while index building cares about particular ones.
>>  >>
>>  >> Kirill's approach sounds good.
>>  >>
>>  >> Stan
>>  >>
>>  >> > On 20 Jul 2020, at 14:54, Alexey Goncharuk <
>>  alexey.goncharuk@gmail.com>
>>  >> wrote:
>>  >> >
>>  >> > Stan,
>>  >> >
>>  >> > Currently we never build indexes one-by-one - we always use a cache
>>  data
>>  >> > row visitor which either updates all indexes (see
>>  >> IndexRebuildFullClosure)
>>  >> > or updates a set of all indexes that need to catch up (see
>>  >> > IndexRebuildPartialClosure). GIven that, I do not see any need for
>>  >> > per-index rebuild status as this status will be updated for all
>>  outdated
>>  >> > indexes simultaneously.
>>  >> >
>>  >> > Kirill's approach for the total number of processed keys per cache
>>  seems
>>  >> > reasonable to me.
>>  >> >
>>  >> > --AG
>>  >> >
>>  >> > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tk...@yandex.ru>:
>>  >> >
>>  >> >> Hi, Stan!
>>  >> >>
>>  >> >> Perhaps it is worth clarifying what exactly I wanted to say.
>>  >> >> Now we have 2 processes: building and rebuilding indexes.
>>  >> >>
>>  >> >> At moment, we have some metrics for rebuilding indexes:
>>  >> >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
>>  >> >>
>>  >> >> I suggest adding another metric "Indexrebuildkeyprocessed", which
>>  will
>>  >> >> allow you to determine how many records are left to rebuild for
>>  cache.
>>  >> >>
>>  >> >> I think your comments are more about building an index that may need
>>  >> more
>>  >> >> metrics, but I think you should do it in a separate ticket.
>>  >> >>
>>  >> >> 03.07.2020, 03:09, "Stanislav Lukyanov" <st...@gmail.com>:
>>  >> >>> If multiple indexes are to be built "number of indexed keys"
>>  metric may
>>  >> >> be misleading.
>>  >> >>>
>>  >> >>> As a cluster admin, I'd like to know:
>>  >> >>> - Are all indexes ready on a node?
>>  >> >>> - How many indexes are to be built?
>>  >> >>> - How much resources are used by the index building (how many
>>  threads
>>  >> >> are used)?
>>  >> >>> - Which index(es?) is being built right now?
>>  >> >>> - How much time until the current (single) index building finishes?
>>  >> Here
>>  >> >> "time" can be a lot of things: partitions, entries, percent of the
>>  >> cache,
>>  >> >> minutes and hours
>>  >> >>> - How much time until all indexes are built?
>>  >> >>> - How much does it take to build each of my indexes / a single
>>  index of
>>  >> >> my cache on average?
>>  >> >>>
>>  >> >>> I think we need a set of metrics and/or log messages to solve all
>>  of
>>  >> >> these questions.
>>  >> >>> I imaging something like:
>>  >> >>> - numberOfIndexesToBuild
>>  >> >>> - a standard set of metrics on the index building thread pool (do
>>  we
>>  >> >> already have it?)
>>  >> >>> - currentlyBuiltIndexName (assuming we only build one at a time
>>  which
>>  >> is
>>  >> >> probably not true)
>>  >> >>> - for the "time" metrics I think percentage might be the best as
>>  it's
>>  >> >> the easiest to understand; we may add multiple metrics though.
>>  >> >>> - For "time per each index" I'd add detailed log messages stating
>>  how
>>  >> >> long did it take to build a particular index
>>  >> >>>
>>  >> >>> Thanks,
>>  >> >>> Stan
>>  >> >>>
>>  >> >>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <tk...@yandex.ru>
>>  >> >> wrote:
>>  >> >>>>
>>  >> >>>> Hi, Igniters.
>>  >> >>>>
>>  >> >>>> I would like to know if it is possible to estimate how much the
>>  index
>>  >> >> rebuild will take?
>>  >> >>>>
>>  >> >>>> At the moment, I have found the following metrics [1] and [2] and
>>  >> >> since the rebuild is based on caches, I think it would be useful to
>>  know
>>  >> >> how many records are processed in indexing. This way we can
>>  estimate how
>>  >> >> long we have to wait for the index to be rebuilt by subtracting [3]
>>  and
>>  >> how
>>  >> >> many records are indexed.
>>  >> >>>>
>>  >> >>>> I think we should add this metric [4].
>>  >> >>>>
>>  >> >>>> Comments, suggestions?
>>  >> >>>>
>>  >> >>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
>>  >> >>>> [2] -
>>  >> >>
>>  >>
>>   org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
>>  >> >>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
>>  >> >>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
>>  >> >>

Re: [DISCUSSION] Add index rebuild time metrics

Posted by Ivan Rakov <iv...@gmail.com>.

>
> This metric can be used only for local node, to get size of cache use
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.

 Got it, agree.

If there is a lot of data in node that can be rebuilt, percentage may
> change very rarely and may not give an estimate of how much time is left.
> If we see for example that 50_000 keys are rebuilt once a minute, and we
> have 1_000_000_000 keys, then we can have an approximate estimate. What do
> you think of that?

If the progress is very slow, I don't see issues with tracking it if the
percentage float has enough precision.
Still, usability of the metric concerns me. In order to estimate remaining
time of index rebuild, user should:
1. Gain an understanding that local cache size
(CacheMetricsImpl#getCacheSize) should be used as a 100% milestone (it
isn't mentioned neither in javadoc nor in JMX method description).
2. Manually calculate sum of all metrics and divide to sum of all cache
sizes.
On the other hand, % of index rebuild progress is self-descriptive. I don't
understand why we tend to make user's life harder.

--
Best regards,
Ivan


On Mon, Aug 10, 2020 at 8:53 PM ткаленко кирилл <tk...@yandex.ru>
wrote:

> Hi, Ivan!
>
> For this you can use
> org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
> > How can a local number of processed keys can help us to understand when
> > index rebuild will be finished?
>
> This metric can be used only for local node, to get size of cache use
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> > We can't compare metric value with cache.size(). First one is node-local,
> > while cache size covers all partitions in the cluster.
>
> If there is a lot of data in node that can be rebuilt, percentage may
> change very rarely and may not give an estimate of how much time is left.
> If we see for example that 50_000 keys are rebuilt once a minute, and we
> have 1_000_000_000 keys, then we can have an approximate estimate. What do
> you think of that?
> > I find one single metric much more usable. It would be perfect if metric
> > value is represented in percentage, e.g. current progress of local node
> > index rebuild is 60%.
>
> 10.08.2020, 19:11, "Ivan Rakov" <iv...@gmail.com>:
> > Folks,
> >
> > Sorry for coming late to the party. I've taken a look at this issue
> during
> > review.
> >
> > How can a local number of processed keys can help us to understand when
> > index rebuild will be finished?
> > We can't compare metric value with cache.size(). First one is node-local,
> > while cache size covers all partitions in the cluster.
> > Also, I don't understand why we need to keep separate metrics for all
> > caches. Of course, the metric becomes more fair, but obviously harder to
> > make conclusions on whether "the index rebuild" process is over (and the
> > cluster is ready to process queries quickly).
> >
> > I find one single metric much more usable. It would be perfect if metric
> > value is represented in percentage, e.g. current progress of local node
> > index rebuild is 60%.
> >
> > --
> > Best regards,
> > Ivan
> >
> > On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <
> stanlukyanov@gmail.com>
> > wrote:
> >
> >>  Got it. I thought that index building and index rebuilding are
> essentially
> >>  the same,
> >>  but now I see that they are different: index rebuilding cares about all
> >>  indexes at once while index building cares about particular ones.
> >>
> >>  Kirill's approach sounds good.
> >>
> >>  Stan
> >>
> >>  > On 20 Jul 2020, at 14:54, Alexey Goncharuk <
> alexey.goncharuk@gmail.com>
> >>  wrote:
> >>  >
> >>  > Stan,
> >>  >
> >>  > Currently we never build indexes one-by-one - we always use a cache
> data
> >>  > row visitor which either updates all indexes (see
> >>  IndexRebuildFullClosure)
> >>  > or updates a set of all indexes that need to catch up (see
> >>  > IndexRebuildPartialClosure). GIven that, I do not see any need for
> >>  > per-index rebuild status as this status will be updated for all
> outdated
> >>  > indexes simultaneously.
> >>  >
> >>  > Kirill's approach for the total number of processed keys per cache
> seems
> >>  > reasonable to me.
> >>  >
> >>  > --AG
> >>  >
> >>  > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tk...@yandex.ru>:
> >>  >
> >>  >> Hi, Stan!
> >>  >>
> >>  >> Perhaps it is worth clarifying what exactly I wanted to say.
> >>  >> Now we have 2 processes: building and rebuilding indexes.
> >>  >>
> >>  >> At moment, we have some metrics for rebuilding indexes:
> >>  >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
> >>  >>
> >>  >> I suggest adding another metric "Indexrebuildkeyprocessed", which
> will
> >>  >> allow you to determine how many records are left to rebuild for
> cache.
> >>  >>
> >>  >> I think your comments are more about building an index that may need
> >>  more
> >>  >> metrics, but I think you should do it in a separate ticket.
> >>  >>
> >>  >> 03.07.2020, 03:09, "Stanislav Lukyanov" <st...@gmail.com>:
> >>  >>> If multiple indexes are to be built "number of indexed keys"
> metric may
> >>  >> be misleading.
> >>  >>>
> >>  >>> As a cluster admin, I'd like to know:
> >>  >>> - Are all indexes ready on a node?
> >>  >>> - How many indexes are to be built?
> >>  >>> - How much resources are used by the index building (how many
> threads
> >>  >> are used)?
> >>  >>> - Which index(es?) is being built right now?
> >>  >>> - How much time until the current (single) index building finishes?
> >>  Here
> >>  >> "time" can be a lot of things: partitions, entries, percent of the
> >>  cache,
> >>  >> minutes and hours
> >>  >>> - How much time until all indexes are built?
> >>  >>> - How much does it take to build each of my indexes / a single
> index of
> >>  >> my cache on average?
> >>  >>>
> >>  >>> I think we need a set of metrics and/or log messages to solve all
> of
> >>  >> these questions.
> >>  >>> I imaging something like:
> >>  >>> - numberOfIndexesToBuild
> >>  >>> - a standard set of metrics on the index building thread pool (do
> we
> >>  >> already have it?)
> >>  >>> - currentlyBuiltIndexName (assuming we only build one at a time
> which
> >>  is
> >>  >> probably not true)
> >>  >>> - for the "time" metrics I think percentage might be the best as
> it's
> >>  >> the easiest to understand; we may add multiple metrics though.
> >>  >>> - For "time per each index" I'd add detailed log messages stating
> how
> >>  >> long did it take to build a particular index
> >>  >>>
> >>  >>> Thanks,
> >>  >>> Stan
> >>  >>>
> >>  >>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <tk...@yandex.ru>
> >>  >> wrote:
> >>  >>>>
> >>  >>>> Hi, Igniters.
> >>  >>>>
> >>  >>>> I would like to know if it is possible to estimate how much the
> index
> >>  >> rebuild will take?
> >>  >>>>
> >>  >>>> At the moment, I have found the following metrics [1] and [2] and
> >>  >> since the rebuild is based on caches, I think it would be useful to
> know
> >>  >> how many records are processed in indexing. This way we can
> estimate how
> >>  >> long we have to wait for the index to be rebuilt by subtracting [3]
> and
> >>  how
> >>  >> many records are indexed.
> >>  >>>>
> >>  >>>> I think we should add this metric [4].
> >>  >>>>
> >>  >>>> Comments, suggestions?
> >>  >>>>
> >>  >>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
> >>  >>>> [2] -
> >>  >>
> >>
>  org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
> >>  >>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
> >>  >>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
> >>  >>
>

Re: [DISCUSSION] Add index rebuild time metrics

Posted by ткаленко кирилл <tk...@yandex.ru>.

Hi, Ivan!

For this you can use org.apache.ignite.cache.CacheMetrics#IsIndexRebuildInProgress
> How can a local number of processed keys can help us to understand when
> index rebuild will be finished?

This metric can be used only for local node, to get size of cache use org.apache.ignite.internal.processors.cache.CacheMetricsImpl#getCacheSize.
> We can't compare metric value with cache.size(). First one is node-local,
> while cache size covers all partitions in the cluster.

If there is a lot of data in node that can be rebuilt, percentage may change very rarely and may not give an estimate of how much time is left. If we see for example that 50_000 keys are rebuilt once a minute, and we have 1_000_000_000 keys, then we can have an approximate estimate. What do you think of that?
> I find one single metric much more usable. It would be perfect if metric
> value is represented in percentage, e.g. current progress of local node
> index rebuild is 60%.

10.08.2020, 19:11, "Ivan Rakov" <iv...@gmail.com>:
> Folks,
>
> Sorry for coming late to the party. I've taken a look at this issue during
> review.
>
> How can a local number of processed keys can help us to understand when
> index rebuild will be finished?
> We can't compare metric value with cache.size(). First one is node-local,
> while cache size covers all partitions in the cluster.
> Also, I don't understand why we need to keep separate metrics for all
> caches. Of course, the metric becomes more fair, but obviously harder to
> make conclusions on whether "the index rebuild" process is over (and the
> cluster is ready to process queries quickly).
>
> I find one single metric much more usable. It would be perfect if metric
> value is represented in percentage, e.g. current progress of local node
> index rebuild is 60%.
>
> --
> Best regards,
> Ivan
>
> On Fri, Jul 24, 2020 at 1:35 PM Stanislav Lukyanov <st...@gmail.com>
> wrote:
>
>>  Got it. I thought that index building and index rebuilding are essentially
>>  the same,
>>  but now I see that they are different: index rebuilding cares about all
>>  indexes at once while index building cares about particular ones.
>>
>>  Kirill's approach sounds good.
>>
>>  Stan
>>
>>  > On 20 Jul 2020, at 14:54, Alexey Goncharuk <al...@gmail.com>
>>  wrote:
>>  >
>>  > Stan,
>>  >
>>  > Currently we never build indexes one-by-one - we always use a cache data
>>  > row visitor which either updates all indexes (see
>>  IndexRebuildFullClosure)
>>  > or updates a set of all indexes that need to catch up (see
>>  > IndexRebuildPartialClosure). GIven that, I do not see any need for
>>  > per-index rebuild status as this status will be updated for all outdated
>>  > indexes simultaneously.
>>  >
>>  > Kirill's approach for the total number of processed keys per cache seems
>>  > reasonable to me.
>>  >
>>  > --AG
>>  >
>>  > пт, 3 июл. 2020 г. в 10:12, ткаленко кирилл <tk...@yandex.ru>:
>>  >
>>  >> Hi, Stan!
>>  >>
>>  >> Perhaps it is worth clarifying what exactly I wanted to say.
>>  >> Now we have 2 processes: building and rebuilding indexes.
>>  >>
>>  >> At moment, we have some metrics for rebuilding indexes:
>>  >> "IsIndexRebuildInProgress", "IndexBuildCountPartitionsLeft".
>>  >>
>>  >> I suggest adding another metric "Indexrebuildkeyprocessed", which will
>>  >> allow you to determine how many records are left to rebuild for cache.
>>  >>
>>  >> I think your comments are more about building an index that may need
>>  more
>>  >> metrics, but I think you should do it in a separate ticket.
>>  >>
>>  >> 03.07.2020, 03:09, "Stanislav Lukyanov" <st...@gmail.com>:
>>  >>> If multiple indexes are to be built "number of indexed keys" metric may
>>  >> be misleading.
>>  >>>
>>  >>> As a cluster admin, I'd like to know:
>>  >>> - Are all indexes ready on a node?
>>  >>> - How many indexes are to be built?
>>  >>> - How much resources are used by the index building (how many threads
>>  >> are used)?
>>  >>> - Which index(es?) is being built right now?
>>  >>> - How much time until the current (single) index building finishes?
>>  Here
>>  >> "time" can be a lot of things: partitions, entries, percent of the
>>  cache,
>>  >> minutes and hours
>>  >>> - How much time until all indexes are built?
>>  >>> - How much does it take to build each of my indexes / a single index of
>>  >> my cache on average?
>>  >>>
>>  >>> I think we need a set of metrics and/or log messages to solve all of
>>  >> these questions.
>>  >>> I imaging something like:
>>  >>> - numberOfIndexesToBuild
>>  >>> - a standard set of metrics on the index building thread pool (do we
>>  >> already have it?)
>>  >>> - currentlyBuiltIndexName (assuming we only build one at a time which
>>  is
>>  >> probably not true)
>>  >>> - for the "time" metrics I think percentage might be the best as it's
>>  >> the easiest to understand; we may add multiple metrics though.
>>  >>> - For "time per each index" I'd add detailed log messages stating how
>>  >> long did it take to build a particular index
>>  >>>
>>  >>> Thanks,
>>  >>> Stan
>>  >>>
>>  >>>> On 26 Jun 2020, at 12:49, ткаленко кирилл <tk...@yandex.ru>
>>  >> wrote:
>>  >>>>
>>  >>>> Hi, Igniters.
>>  >>>>
>>  >>>> I would like to know if it is possible to estimate how much the index
>>  >> rebuild will take?
>>  >>>>
>>  >>>> At the moment, I have found the following metrics [1] and [2] and
>>  >> since the rebuild is based on caches, I think it would be useful to know
>>  >> how many records are processed in indexing. This way we can estimate how
>>  >> long we have to wait for the index to be rebuilt by subtracting [3] and
>>  how
>>  >> many records are indexed.
>>  >>>>
>>  >>>> I think we should add this metric [4].
>>  >>>>
>>  >>>> Comments, suggestions?
>>  >>>>
>>  >>>> [1] - https://issues.apache.org/jira/browse/IGNITE-12184
>>  >>>> [2] -
>>  >>
>>  org.apache.ignite.internal.processors.cache.CacheGroupMetricsImpl#idxBuildCntPartitionsLeft
>>  >>>> [3] - org.apache.ignite.cache.CacheMetrics#getCacheSize
>>  >>>> [4] - org.apache.ignite.cache.CacheMetrics#getNumberIndexedKeys
>>  >>