You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ryan Zezeski (JIRA)" <ji...@apache.org> on 2013/03/02 01:05:13 UTC

[jira] [Updated] (SOLR-4509) Disable Stale Check - Distributed Search (Performance)

     [ https://issues.apache.org/jira/browse/SOLR-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan Zezeski updated SOLR-4509:
-------------------------------

    Attachment: baremetal-stale-nostale-throughput.svg
                baremetal-stale-nostale-throughput.dat
                baremetal-stale-nostale-med-latency.svg
                baremetal-stale-nostale-med-latency.dat

I have some new results from a different cluster.  The short story is
that I still see improvement from removing the stale check, just not
as dramatic as on my SmartOS cluster.  Throughput improved by 108-120%
and there was a 0-5ms delta in latency.

What I take from this is that the benefits of removing the stale check
will vary depending on # of nodes, hardware, query and load.  In
theory removing the stale check should never hurt as removing blocking
syscalls should only help.  But I totally understand if about being
cautious with a change like this.  Personally I'd like to see at least
one other person confirm a non-negligible difference before I bother to
make this patch more acceptable.  Best to let this ticket stir a while
I suppose.

## Cluster Specs

Add nodes are running on baremetalcloud so this time they are truly
different physical machines with no virtualization involved.

* 8 nodes/shards
* 1 x 2.66GHz Woodcrest E5150 (2 cores)
* 2GB DDR2-667
* 73GB SAS 10k RPM
* Ubuntu 12.04
* Oracle JDK: Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
* 512MB max heap
* Example schema

## Bench Runner

* 1 node
* 2 x 2.66GHz Woodcrest E5150
* 8GB DDR2-667
* Using Basho Bench as load gen

## Queries


All queries hit all shards.  All queries were single term queries
except for alpha which is conjunction.  The numbers listed are the
number of documents matching each term query.

* alpha: 100K, 100K, 0
* lima: 1
* mike: 10
* november: 100
* oscar: 1K
* papa: 10K
* quebec: 100K

Attached is the aggregate data (.dat) and corresponding plots (.svg)
of that data.  The data was aggregated from raw data collected by
Basho Bench (and this raw data is actually the aggregate of all events
at 10s intervals).  E.g. the median latency is actually the mean of
the median latencies calculated against all events in a given 10s
period.  What I'm saying is, it's rollup or a rollup so while there
are 2 decimals of precision those numbers are not actually that
precise.  But this should be good for ballpark figures (if you're a
stats geek please let me know if I'm committing a sin here).

There is a big delta in latency for the mike benchmark but I'm
chalking that up to an anomaly for the time being.

                
> Disable Stale Check - Distributed Search (Performance)
> ------------------------------------------------------
>
>                 Key: SOLR-4509
>                 URL: https://issues.apache.org/jira/browse/SOLR-4509
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>         Environment: 5 node SmartOS cluster (all nodes living in same global zone - i.e. same physical machine)
>            Reporter: Ryan Zezeski
>            Priority: Minor
>         Attachments: baremetal-stale-nostale-med-latency.dat, baremetal-stale-nostale-med-latency.svg, baremetal-stale-nostale-throughput.dat, baremetal-stale-nostale-throughput.svg, IsStaleTime.java, SOLR-4509.patch
>
>
> By disabling the Apache HTTP Client stale check I've witnessed a 2-4x increase in throughput and reduction of over 100ms.  This patch was made in the context of a project I'm leading, called Yokozuna, which relies on distributed search.
> Here's the patch on Yokozuna: https://github.com/rzezeski/yokozuna/pull/26
> Here's a write-up I did on my findings: http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html
> I'm happy to answer any questions or make changes to the patch to make it acceptable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org