You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Shawn Smith (JIRA)" <ji...@apache.org> on 2010/03/30 18:48:27 UTC

[jira] Updated: (SOLR-1855) Script to monitor Solr health including replication status

     [ https://issues.apache.org/jira/browse/SOLR-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shawn Smith updated SOLR-1855:
------------------------------

    Attachment: checksolr

I've attached a first pass implementation of this script: !checksolr!.  It's basically the script we're using in our production environment to monitor Solr health.  As such, it's not completely generic, but it should be a decent start:
* bash script tested only on Linux
* dependencies on curl, xmllint, xmlstarlet (curl, libxml2, xmlstarlet packages)
* assumes url structure corresponding to the default multi-core Solr configuration (http://<host>:<port>/solr/admin/cores, .../solr/<core>/admin/ping, .../solr/<core>/replication?command=details)
* checks slave replication health assuming Solr 1.4 Java replication
* dynamically determines the set of Solr cores, so it's useful in a multi-core deployment where the set of cores may change relatively often

Example usage:
{noformat}
$ ./checksolr -?
Usage:
    checksolr [OPTIONS]

Options:
    --help | -h
        Print the brief help message and exit.

    --man
        Print the manual page and exit.

    --host | -H HOST
        Check this host instead of localhost.

    --port | -P Port
        Use this port instead of the default(8983) to connect.

    --diff | -D Time difference between now and when solr last replicated
        Use this option to set the maximum difference in seconds between the
        time when the solr slave replicated and now.

    --slave
        Perform slave checks on the host instead of ping tests.

$ ./checksolr --host solrmaster1
Core "core0" returned "OK".
Core "core1" returned "OK".
Core "core2" returned "OK".
$ echo $?
0

$ ./checksolr --slave --host solrslave1
Core "core0" is up to date.
Core "core1" is up to date.
Core "core2" is having trouble replicating.
$ echo $?
1
{noformat}

> Script to monitor Solr health including replication status
> ----------------------------------------------------------
>
>                 Key: SOLR-1855
>                 URL: https://issues.apache.org/jira/browse/SOLR-1855
>             Project: Solr
>          Issue Type: New Feature
>          Components: replication (java)
>    Affects Versions: 1.4
>            Reporter: Shawn Smith
>         Attachments: checksolr
>
>
> It would be useful to have a simple monitor script that checks the health of all cores on a solr server.
> # Call the "ping" command and verify success.
> # Check for replication failures, for replication slaves.
> The script should return a non-zero exit code if any serious errors are discovered.  This should make it easy to plug the script into a monitoring framework (Nagios, etc.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.