You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Radu Gheorghe <ra...@sematext.com> on 2020/04/28 11:57:36 UTC

Which Solr metrics do you find important?

Hi fellow Solr users,

I'm looking into improving our Solr monitoring
<https://sematext.com/docs/integration/solr/> and I was curious on which
metrics you consider relevant.

From what we currently have, I'm only really missing fieldCache. Which we
collect, but not show in the UI yet (unless you add a custom chart - we'll
add it to default soon).

You can click on a demo account <https://apps.sematext.com/demo> (there's a
Solr app there called PH.Prod.Solr7) to see what we already collect, but
I'll write it here in short:
- query rate and latency (you can group per handler, per core, per
collection if it's SolrCloud)
- index size (number of segments, files...)
- indexing: added/deleted docs, commits
- caches (size, hit ratio, warmup...)
- OS- and JVM-level metrics (from CPU iowait to GC latency and everything
in between)

Anything that we should add?

I went through the Metrics API output, and the only significant thing I can
think of is the transaction log. But to be honest I never checked those
metrics in practice.

Or maybe there's something outside the Metrics API that would be useful? I
thought about the breakdown of shards that are up/down/recovering... as
well as replica types. We plan on adding those, but there's a challenge in
de-duplicating metrics. Because one would install one agent per node, and
I'm not aware of a way to show only local shards in the Collections API ->
CLUSTERSTATUS.

Thanks in advance for any feedback that you may have!
Radu
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/

Re: Which Solr metrics do you find important?

Posted by Radu Gheorghe <ra...@sematext.com>.

Thanks Matthew and Walter. OK, so you both use the clusterstatus output in
your regular monitoring. This seems to be missing from what we have now (we
collect everything else you mentioned, like response time percentiles, disk
IO, etc). So I guess clusterstatus deserves a priority bump :)

Best regards,
Radu
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 28, 2020 at 6:47 PM Walter Underwood <wu...@wunderwood.org>
wrote:

> I also have some Python that pull stuff from clusterstatus and sends it to
> InfluxDB.
>
> We wrote a servlet filter that intercepts requests to Solr and sends
> performance data
> to monitoring. That gives us per-request handler traffic and response time
> percentiles.
>
> Telegraf for CPU, run queue, disk IO, etc.
>
> CloudWatch for load balancer traffic, errors, and healthy host count.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Apr 28, 2020, at 8:00 AM, matthew sporleder <ms...@gmail.com>
> wrote:
> >
> > I think clusterstatus is how you find some of that stuff.
> >
> > I wrote this when I was using datadog to supplement what they offered:
> > https://github.com/msporleder/dd-solrcloud/blob/master/solrcloud.py
> > (sorry for crappy python) and it got me most of the monitoring I
> > needed for my particular situation.
> >
> >
> >
> >
> > On Tue, Apr 28, 2020 at 10:52 AM Radu Gheorghe
> > <ra...@sematext.com> wrote:
> >>
> >> Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As
> well
> >> as Collections API stuff (clusterstatus, overseerstatus).
> >>
> >> And DIH, I didn't think that these stats would be interesting, but
> surely
> >> they are for people who use DIH :)
> >>
> >> Best regards,
> >> Radu
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >> On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <msporleder@gmail.com
> >
> >> wrote:
> >>
> >>> size-on-disk of cores, size of tlogs, DIH stats over time, last
> >>> modified date of cores
> >>>
> >>> The most important alert-type things are -- collections in recovery or
> >>> down state, solrcloud election events, various error rates
> >>>
> >>> It's also important to be able to tie these back to aliases so you are
> >>> only monitoring cores you care about, even if their backing collection
> >>> name changes every so often
> >>>
> >>>
> >>>
> >>> On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe
> >>> <ra...@sematext.com> wrote:
> >>>>
> >>>> Hi fellow Solr users,
> >>>>
> >>>> I'm looking into improving our Solr monitoring
> >>>> <https://sematext.com/docs/integration/solr/> and I was curious on
> which
> >>>> metrics you consider relevant.
> >>>>
> >>>> From what we currently have, I'm only really missing fieldCache.
> Which we
> >>>> collect, but not show in the UI yet (unless you add a custom chart -
> >>> we'll
> >>>> add it to default soon).
> >>>>
> >>>> You can click on a demo account <https://apps.sematext.com/demo>
> >>> (there's a
> >>>> Solr app there called PH.Prod.Solr7) to see what we already collect,
> but
> >>>> I'll write it here in short:
> >>>> - query rate and latency (you can group per handler, per core, per
> >>>> collection if it's SolrCloud)
> >>>> - index size (number of segments, files...)
> >>>> - indexing: added/deleted docs, commits
> >>>> - caches (size, hit ratio, warmup...)
> >>>> - OS- and JVM-level metrics (from CPU iowait to GC latency and
> everything
> >>>> in between)
> >>>>
> >>>> Anything that we should add?
> >>>>
> >>>> I went through the Metrics API output, and the only significant thing
> I
> >>> can
> >>>> think of is the transaction log. But to be honest I never checked
> those
> >>>> metrics in practice.
> >>>>
> >>>> Or maybe there's something outside the Metrics API that would be
> useful?
> >>> I
> >>>> thought about the breakdown of shards that are up/down/recovering...
> as
> >>>> well as replica types. We plan on adding those, but there's a
> challenge
> >>> in
> >>>> de-duplicating metrics. Because one would install one agent per node,
> and
> >>>> I'm not aware of a way to show only local shards in the Collections
> API
> >>> ->
> >>>> CLUSTERSTATUS.
> >>>>
> >>>> Thanks in advance for any feedback that you may have!
> >>>> Radu
> >>>> --
> >>>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>
>
>

Re: Which Solr metrics do you find important?

Posted by Walter Underwood <wu...@wunderwood.org>.

I also have some Python that pull stuff from clusterstatus and sends it to InfluxDB.

We wrote a servlet filter that intercepts requests to Solr and sends performance data
to monitoring. That gives us per-request handler traffic and response time percentiles.

Telegraf for CPU, run queue, disk IO, etc.

CloudWatch for load balancer traffic, errors, and healthy host count.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 28, 2020, at 8:00 AM, matthew sporleder <ms...@gmail.com> wrote:
> 
> I think clusterstatus is how you find some of that stuff.
> 
> I wrote this when I was using datadog to supplement what they offered:
> https://github.com/msporleder/dd-solrcloud/blob/master/solrcloud.py
> (sorry for crappy python) and it got me most of the monitoring I
> needed for my particular situation.
> 
> 
> 
> 
> On Tue, Apr 28, 2020 at 10:52 AM Radu Gheorghe
> <ra...@sematext.com> wrote:
>> 
>> Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As well
>> as Collections API stuff (clusterstatus, overseerstatus).
>> 
>> And DIH, I didn't think that these stats would be interesting, but surely
>> they are for people who use DIH :)
>> 
>> Best regards,
>> Radu
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <ms...@gmail.com>
>> wrote:
>> 
>>> size-on-disk of cores, size of tlogs, DIH stats over time, last
>>> modified date of cores
>>> 
>>> The most important alert-type things are -- collections in recovery or
>>> down state, solrcloud election events, various error rates
>>> 
>>> It's also important to be able to tie these back to aliases so you are
>>> only monitoring cores you care about, even if their backing collection
>>> name changes every so often
>>> 
>>> 
>>> 
>>> On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe
>>> <ra...@sematext.com> wrote:
>>>> 
>>>> Hi fellow Solr users,
>>>> 
>>>> I'm looking into improving our Solr monitoring
>>>> <https://sematext.com/docs/integration/solr/> and I was curious on which
>>>> metrics you consider relevant.
>>>> 
>>>> From what we currently have, I'm only really missing fieldCache. Which we
>>>> collect, but not show in the UI yet (unless you add a custom chart -
>>> we'll
>>>> add it to default soon).
>>>> 
>>>> You can click on a demo account <https://apps.sematext.com/demo>
>>> (there's a
>>>> Solr app there called PH.Prod.Solr7) to see what we already collect, but
>>>> I'll write it here in short:
>>>> - query rate and latency (you can group per handler, per core, per
>>>> collection if it's SolrCloud)
>>>> - index size (number of segments, files...)
>>>> - indexing: added/deleted docs, commits
>>>> - caches (size, hit ratio, warmup...)
>>>> - OS- and JVM-level metrics (from CPU iowait to GC latency and everything
>>>> in between)
>>>> 
>>>> Anything that we should add?
>>>> 
>>>> I went through the Metrics API output, and the only significant thing I
>>> can
>>>> think of is the transaction log. But to be honest I never checked those
>>>> metrics in practice.
>>>> 
>>>> Or maybe there's something outside the Metrics API that would be useful?
>>> I
>>>> thought about the breakdown of shards that are up/down/recovering... as
>>>> well as replica types. We plan on adding those, but there's a challenge
>>> in
>>>> de-duplicating metrics. Because one would install one agent per node, and
>>>> I'm not aware of a way to show only local shards in the Collections API
>>> ->
>>>> CLUSTERSTATUS.
>>>> 
>>>> Thanks in advance for any feedback that you may have!
>>>> Radu
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>

Re: Which Solr metrics do you find important?

Posted by matthew sporleder <ms...@gmail.com>.

I think clusterstatus is how you find some of that stuff.

I wrote this when I was using datadog to supplement what they offered:
https://github.com/msporleder/dd-solrcloud/blob/master/solrcloud.py
(sorry for crappy python) and it got me most of the monitoring I
needed for my particular situation.




On Tue, Apr 28, 2020 at 10:52 AM Radu Gheorghe
<ra...@sematext.com> wrote:
>
> Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As well
> as Collections API stuff (clusterstatus, overseerstatus).
>
> And DIH, I didn't think that these stats would be interesting, but surely
> they are for people who use DIH :)
>
> Best regards,
> Radu
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
> On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <ms...@gmail.com>
> wrote:
>
> > size-on-disk of cores, size of tlogs, DIH stats over time, last
> > modified date of cores
> >
> > The most important alert-type things are -- collections in recovery or
> > down state, solrcloud election events, various error rates
> >
> > It's also important to be able to tie these back to aliases so you are
> > only monitoring cores you care about, even if their backing collection
> > name changes every so often
> >
> >
> >
> > On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe
> > <ra...@sematext.com> wrote:
> > >
> > > Hi fellow Solr users,
> > >
> > > I'm looking into improving our Solr monitoring
> > > <https://sematext.com/docs/integration/solr/> and I was curious on which
> > > metrics you consider relevant.
> > >
> > > From what we currently have, I'm only really missing fieldCache. Which we
> > > collect, but not show in the UI yet (unless you add a custom chart -
> > we'll
> > > add it to default soon).
> > >
> > > You can click on a demo account <https://apps.sematext.com/demo>
> > (there's a
> > > Solr app there called PH.Prod.Solr7) to see what we already collect, but
> > > I'll write it here in short:
> > > - query rate and latency (you can group per handler, per core, per
> > > collection if it's SolrCloud)
> > > - index size (number of segments, files...)
> > > - indexing: added/deleted docs, commits
> > > - caches (size, hit ratio, warmup...)
> > > - OS- and JVM-level metrics (from CPU iowait to GC latency and everything
> > > in between)
> > >
> > > Anything that we should add?
> > >
> > > I went through the Metrics API output, and the only significant thing I
> > can
> > > think of is the transaction log. But to be honest I never checked those
> > > metrics in practice.
> > >
> > > Or maybe there's something outside the Metrics API that would be useful?
> > I
> > > thought about the breakdown of shards that are up/down/recovering... as
> > > well as replica types. We plan on adding those, but there's a challenge
> > in
> > > de-duplicating metrics. Because one would install one agent per node, and
> > > I'm not aware of a way to show only local shards in the Collections API
> > ->
> > > CLUSTERSTATUS.
> > >
> > > Thanks in advance for any feedback that you may have!
> > > Radu
> > > --
> > > Monitoring - Log Management - Alerting - Anomaly Detection
> > > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >

Re: Which Solr metrics do you find important?

Posted by Radu Gheorghe <ra...@sematext.com>.

Thanks a lot, Matthew! OK, so you do care about the size of tlogs. As well
as Collections API stuff (clusterstatus, overseerstatus).

And DIH, I didn't think that these stats would be interesting, but surely
they are for people who use DIH :)

Best regards,
Radu
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/


On Tue, Apr 28, 2020 at 4:17 PM matthew sporleder <ms...@gmail.com>
wrote:

> size-on-disk of cores, size of tlogs, DIH stats over time, last
> modified date of cores
>
> The most important alert-type things are -- collections in recovery or
> down state, solrcloud election events, various error rates
>
> It's also important to be able to tie these back to aliases so you are
> only monitoring cores you care about, even if their backing collection
> name changes every so often
>
>
>
> On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe
> <ra...@sematext.com> wrote:
> >
> > Hi fellow Solr users,
> >
> > I'm looking into improving our Solr monitoring
> > <https://sematext.com/docs/integration/solr/> and I was curious on which
> > metrics you consider relevant.
> >
> > From what we currently have, I'm only really missing fieldCache. Which we
> > collect, but not show in the UI yet (unless you add a custom chart -
> we'll
> > add it to default soon).
> >
> > You can click on a demo account <https://apps.sematext.com/demo>
> (there's a
> > Solr app there called PH.Prod.Solr7) to see what we already collect, but
> > I'll write it here in short:
> > - query rate and latency (you can group per handler, per core, per
> > collection if it's SolrCloud)
> > - index size (number of segments, files...)
> > - indexing: added/deleted docs, commits
> > - caches (size, hit ratio, warmup...)
> > - OS- and JVM-level metrics (from CPU iowait to GC latency and everything
> > in between)
> >
> > Anything that we should add?
> >
> > I went through the Metrics API output, and the only significant thing I
> can
> > think of is the transaction log. But to be honest I never checked those
> > metrics in practice.
> >
> > Or maybe there's something outside the Metrics API that would be useful?
> I
> > thought about the breakdown of shards that are up/down/recovering... as
> > well as replica types. We plan on adding those, but there's a challenge
> in
> > de-duplicating metrics. Because one would install one agent per node, and
> > I'm not aware of a way to show only local shards in the Collections API
> ->
> > CLUSTERSTATUS.
> >
> > Thanks in advance for any feedback that you may have!
> > Radu
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>

Re: Which Solr metrics do you find important?

Posted by matthew sporleder <ms...@gmail.com>.

size-on-disk of cores, size of tlogs, DIH stats over time, last
modified date of cores

The most important alert-type things are -- collections in recovery or
down state, solrcloud election events, various error rates

It's also important to be able to tie these back to aliases so you are
only monitoring cores you care about, even if their backing collection
name changes every so often



On Tue, Apr 28, 2020 at 7:57 AM Radu Gheorghe
<ra...@sematext.com> wrote:
>
> Hi fellow Solr users,
>
> I'm looking into improving our Solr monitoring
> <https://sematext.com/docs/integration/solr/> and I was curious on which
> metrics you consider relevant.
>
> From what we currently have, I'm only really missing fieldCache. Which we
> collect, but not show in the UI yet (unless you add a custom chart - we'll
> add it to default soon).
>
> You can click on a demo account <https://apps.sematext.com/demo> (there's a
> Solr app there called PH.Prod.Solr7) to see what we already collect, but
> I'll write it here in short:
> - query rate and latency (you can group per handler, per core, per
> collection if it's SolrCloud)
> - index size (number of segments, files...)
> - indexing: added/deleted docs, commits
> - caches (size, hit ratio, warmup...)
> - OS- and JVM-level metrics (from CPU iowait to GC latency and everything
> in between)
>
> Anything that we should add?
>
> I went through the Metrics API output, and the only significant thing I can
> think of is the transaction log. But to be honest I never checked those
> metrics in practice.
>
> Or maybe there's something outside the Metrics API that would be useful? I
> thought about the breakdown of shards that are up/down/recovering... as
> well as replica types. We plan on adding those, but there's a challenge in
> de-duplicating metrics. Because one would install one agent per node, and
> I'm not aware of a way to show only local shards in the Collections API ->
> CLUSTERSTATUS.
>
> Thanks in advance for any feedback that you may have!
> Radu
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/