You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Владимир Бухтояров <js...@mail.ru.INVALID> on 2016/10/20 15:06:04 UTC

Re[3]: Histogram error "Unable to compute ceiling for max when histogram overflowed"

I have investigated the problem and found that monitoring was serriously changed since 3.7(version when I got exception in com.codahale.metrics.servlets.MetricsServlet). Since version 3.9 it is enough to change behavior of DecayingEstimatedHistogramReservoir, the EstimatedHistogram should stay unchanged. The modification of DecayingEstimatedHistogramReservoir will be safe, because in opposite to EstimatedHistogram, the DecayingEstimatedHistogramReservoir is not used for Cassandra internal needs.

Also I found very strange resolution of  issue  CASSANDRA-12185 - the nothing done to prevent of IllegalStateException, but issue is closed. Should I reopen #12185 or deliver pull request in new issue?


Best regards,
Bukhtoyarov Vladimir
email jsecoder@mail.ru
skype live:fanat-tdd
Github: https://github.com/vladimir-bukhtoyarov
mobile +79618096798

>Среда, 19 октября 2016, 21:12 +03:00 от Владимир Бухтояров <js...@mail.ru.INVALID>:
>
>The null(zero) values of snapshot are useless for problem analysing, because it is impossible to distinguishing case when there are no events from case when events were dispatched too slow. I do not see any criminal to return  999-th percentile as 3h when histogram configured with 3h max and any latency is 4h.
>
>
>Best regards,
>Bukhtoyarov Vladimir
>email  jsecoder@mail.ru
>skype live:fanat-tdd
>Github:  https://github.com/vladimir-bukhtoyarov
>mobile  +79618096798
>
>>Среда, 19 октября 2016, 20:17 +03:00 от Ken Hancock < ken.hancock@schange.com >:
>>
>>I would suggest metrics should return null values instead of false values.
>>
>>On Wed, Oct 19, 2016 at 12:21 PM, Владимир Бухтояров <
>> jsecoder@mail.ru.invalid > wrote:
>>
>>>
>>> Hi to all,
>>>
>>> I want to fix  https://issues.apache.org/jira/browse/CASSANDRA-11063
>>> This issue is very ugly for me, because when something works slow then it
>>> is impossible to capture metrics and save it to monitoring database for
>>> future investigation. Moreover when one histogram throw exception then many
>>> metrics-exporters are unable to export metrics for whole MetricRegistry(for
>>> example MetricsServlet), so when overflow happen in one histogram then I
>>> have no history data at all.
>>>
>>> I propose to implement the following changes:
>>> 1. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>> return maximum trackable value instead of Long.MAX_VALUE
>>> 2. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>> never throw IllegalStateException, instead, it will use maximum trackable
>>> value as regular value in percentile and average calculation.
>>> 3.  If anybody want to save old behavior(prefer to crash instead of
>>> inaccurate reporting) then I can add configuration parameter to save
>>> previous behavior, moreover I can leave old behavior as default, for my
>>> needs it will be enough to have some option to avoid crashes.
>>>
>>>
>>> Best regards,
>>> Bukhtoyarov Vladimir
>>> email  jsecoder@mail.ru
>>> skype live:fanat-tdd
>>> Github:  https://github.com/vladimir-bukhtoyarov
>>>
>

Re: Re[3]: Histogram error "Unable to compute ceiling for max when histogram overflowed"

Posted by Chris Lohfink <ch...@datastax.com>.

i think this is already fixed in
https://issues.apache.org/jira/browse/CASSANDRA-11117

On Thu, Oct 20, 2016 at 3:56 PM, Nate McCall <zz...@gmail.com> wrote:

> Open a new issue and link to CASSANDRA-11063. Including a test case
> addressing your issue that fails after the 11063 change would be ideal
> as well.
>
> Either way, thanks for the continued attention on this.
>
> On Fri, Oct 21, 2016 at 4:06 AM, Владимир Бухтояров
> <js...@mail.ru.invalid> wrote:
> > I have investigated the problem and found that monitoring was serriously
> changed since 3.7(version when I got exception in
> com.codahale.metrics.servlets.MetricsServlet). Since version 3.9 it is
> enough to change behavior of DecayingEstimatedHistogramReservoir, the
> EstimatedHistogram should stay unchanged. The modification of
> DecayingEstimatedHistogramReservoir will be safe, because in opposite to
> EstimatedHistogram, the DecayingEstimatedHistogramReservoir is not used
> for Cassandra internal needs.
> >
> > Also I found very strange resolution of  issue  CASSANDRA-12185 - the
> nothing done to prevent of IllegalStateException, but issue is closed.
> Should I reopen #12185 or deliver pull request in new issue?
> >
> >
> > Best regards,
> > Bukhtoyarov Vladimir
> > email jsecoder@mail.ru
> > skype live:fanat-tdd
> > Github: https://github.com/vladimir-bukhtoyarov
> > mobile +79618096798
> >
> >>Среда, 19 октября 2016, 21:12 +03:00 от Владимир Бухтояров
> <js...@mail.ru.INVALID>:
> >>
> >>The null(zero) values of snapshot are useless for problem analysing,
> because it is impossible to distinguishing case when there are no events
> from case when events were dispatched too slow. I do not see any criminal
> to return  999-th percentile as 3h when histogram configured with 3h max
> and any latency is 4h.
> >>
> >>
> >>Best regards,
> >>Bukhtoyarov Vladimir
> >>email  jsecoder@mail.ru
> >>skype live:fanat-tdd
> >>Github:  https://github.com/vladimir-bukhtoyarov
> >>mobile  +79618096798
> >>
> >>>Среда, 19 октября 2016, 20:17 +03:00 от Ken Hancock <
> ken.hancock@schange.com >:
> >>>
> >>>I would suggest metrics should return null values instead of false
> values.
> >>>
> >>>On Wed, Oct 19, 2016 at 12:21 PM, Владимир Бухтояров <
> >>> jsecoder@mail.ru.invalid > wrote:
> >>>
> >>>>
> >>>> Hi to all,
> >>>>
> >>>> I want to fix  https://issues.apache.org/jira/browse/CASSANDRA-11063
> >>>> This issue is very ugly for me, because when something works slow
> then it
> >>>> is impossible to capture metrics and save it to monitoring database
> for
> >>>> future investigation. Moreover when one histogram throw exception
> then many
> >>>> metrics-exporters are unable to export metrics for whole
> MetricRegistry(for
> >>>> example MetricsServlet), so when overflow happen in one histogram
> then I
> >>>> have no history data at all.
> >>>>
> >>>> I propose to implement the following changes:
> >>>> 1. The DecayingEstimatedHistogramReservoir and EstimatedHistogram
> will
> >>>> return maximum trackable value instead of Long.MAX_VALUE
> >>>> 2. The DecayingEstimatedHistogramReservoir and EstimatedHistogram
> will
> >>>> never throw IllegalStateException, instead, it will use maximum
> trackable
> >>>> value as regular value in percentile and average calculation.
> >>>> 3.  If anybody want to save old behavior(prefer to crash instead of
> >>>> inaccurate reporting) then I can add configuration parameter to save
> >>>> previous behavior, moreover I can leave old behavior as default, for
> my
> >>>> needs it will be enough to have some option to avoid crashes.
> >>>>
> >>>>
> >>>> Best regards,
> >>>> Bukhtoyarov Vladimir
> >>>> email  jsecoder@mail.ru
> >>>> skype live:fanat-tdd
> >>>> Github:  https://github.com/vladimir-bukhtoyarov
> >>>>
> >>
> >
>

Re: Re[3]: Histogram error "Unable to compute ceiling for max when histogram overflowed"

Posted by Nate McCall <zz...@gmail.com>.

Open a new issue and link to CASSANDRA-11063. Including a test case
addressing your issue that fails after the 11063 change would be ideal
as well.

Either way, thanks for the continued attention on this.

On Fri, Oct 21, 2016 at 4:06 AM, Владимир Бухтояров
<js...@mail.ru.invalid> wrote:
> I have investigated the problem and found that monitoring was serriously changed since 3.7(version when I got exception in com.codahale.metrics.servlets.MetricsServlet). Since version 3.9 it is enough to change behavior of DecayingEstimatedHistogramReservoir, the EstimatedHistogram should stay unchanged. The modification of DecayingEstimatedHistogramReservoir will be safe, because in opposite to EstimatedHistogram, the DecayingEstimatedHistogramReservoir is not used for Cassandra internal needs.
>
> Also I found very strange resolution of  issue  CASSANDRA-12185 - the nothing done to prevent of IllegalStateException, but issue is closed. Should I reopen #12185 or deliver pull request in new issue?
>
>
> Best regards,
> Bukhtoyarov Vladimir
> email jsecoder@mail.ru
> skype live:fanat-tdd
> Github: https://github.com/vladimir-bukhtoyarov
> mobile +79618096798
>
>>Среда, 19 октября 2016, 21:12 +03:00 от Владимир Бухтояров <js...@mail.ru.INVALID>:
>>
>>The null(zero) values of snapshot are useless for problem analysing, because it is impossible to distinguishing case when there are no events from case when events were dispatched too slow. I do not see any criminal to return  999-th percentile as 3h when histogram configured with 3h max and any latency is 4h.
>>
>>
>>Best regards,
>>Bukhtoyarov Vladimir
>>email  jsecoder@mail.ru
>>skype live:fanat-tdd
>>Github:  https://github.com/vladimir-bukhtoyarov
>>mobile  +79618096798
>>
>>>Среда, 19 октября 2016, 20:17 +03:00 от Ken Hancock < ken.hancock@schange.com >:
>>>
>>>I would suggest metrics should return null values instead of false values.
>>>
>>>On Wed, Oct 19, 2016 at 12:21 PM, Владимир Бухтояров <
>>> jsecoder@mail.ru.invalid > wrote:
>>>
>>>>
>>>> Hi to all,
>>>>
>>>> I want to fix  https://issues.apache.org/jira/browse/CASSANDRA-11063
>>>> This issue is very ugly for me, because when something works slow then it
>>>> is impossible to capture metrics and save it to monitoring database for
>>>> future investigation. Moreover when one histogram throw exception then many
>>>> metrics-exporters are unable to export metrics for whole MetricRegistry(for
>>>> example MetricsServlet), so when overflow happen in one histogram then I
>>>> have no history data at all.
>>>>
>>>> I propose to implement the following changes:
>>>> 1. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>>> return maximum trackable value instead of Long.MAX_VALUE
>>>> 2. The DecayingEstimatedHistogramReservoir and EstimatedHistogram will
>>>> never throw IllegalStateException, instead, it will use maximum trackable
>>>> value as regular value in percentile and average calculation.
>>>> 3.  If anybody want to save old behavior(prefer to crash instead of
>>>> inaccurate reporting) then I can add configuration parameter to save
>>>> previous behavior, moreover I can leave old behavior as default, for my
>>>> needs it will be enough to have some option to avoid crashes.
>>>>
>>>>
>>>> Best regards,
>>>> Bukhtoyarov Vladimir
>>>> email  jsecoder@mail.ru
>>>> skype live:fanat-tdd
>>>> Github:  https://github.com/vladimir-bukhtoyarov
>>>>
>>
>