You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shawn Heisey (JIRA)" <ji...@apache.org> on 2016/11/04 20:04:58 UTC
[jira] [Commented] (SOLR-9731) Add jvm-wide JMX statistics for Solr

    [ https://issues.apache.org/jira/browse/SOLR-9731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637520#comment-15637520 ] 

Shawn Heisey commented on SOLR-9731:
------------------------------------

Sounds like a really good idea.  Any indicators of health that aren't horribly disruptive should be tracked and made available. Codahale metrics is already a dependency, and it can give us percentiles on any stats where they make sense.

Thinking out loud (and this might be per-core, not JVM-wide, but I don't have anywhere else to discuss it right now):

I wonder if there's any way to detect when and how much actual disk I/O is required to satisfy a query.  I suspect that this information is not readily available to Java, and even if it its, that it would need to be tracked down in the Lucene layer and made available via public getters that Solr could query.

Lucene *might* be able to track statistics about how many nanoseconds it takes for reading X bytes from MMap, and that information could ultimately be interpreted by a user to indicate whether or not their disk caching is effective.  One problem with that idea: Lucene's core functionality has no dependencies, so that feature would probably have to be written using native classes/methods included with the JVM, not an external dependency like the metrics package.  It would be really awesome if we could see median and percentile info about how long the MMap accesses are taking.  We'd be able to use that info to determine whether a performance issue is due to insufficient disk cache.


> Add jvm-wide JMX statistics for Solr
> ------------------------------------
>
>                 Key: SOLR-9731
>                 URL: https://issues.apache.org/jira/browse/SOLR-9731
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>
> The statistics that can currently be gathered via JMX tend to be core-specific, making monitoring "how is the Solr node doing" harder than it needs to be. This JIRA is about exploring what it would take for instance-wide statistics to be JMX-enabled.
> I'm imagining cumulative stats like:
> > How many Solr<->Solr communications errors have there been?
> > How many Solr<->ZK communication errors have there been
> > How many full synchronizations have happened across all replicas?
> > Operations people, fill in your favorite health monitoring bit here.
> What do people think? Is JMX even the right thing? We have an admin end-point for gathering information, but that's not as "operations friendly".
> I'm open to any suggestions for how/where to implement this, whether there are any huge "gotchas", bottleneck concerns, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org