You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by mike anderson <sa...@gmail.com> on 2010/09/24 01:06:26 UTC

distributed search on duplicate shards

Hi all,

My company is currently running a distributed Solr cluster with about 15
shards. We occasionally find that one shard will be relatively slow and thus
hold up the entire response. To remedy this we thought it might be useful to
have a system such that:

1. We can duplicate each shard, and thus have "sets" of shards, each with
the same index
2. We can pass in these sets of shards along with the query (for instance,
if "!" is the delimiter, shards=solr1a!solr1b,solr2a!solr2b)
3. The request goes out to /all/ shards (unlike load balancing in Solr
Cloud)
4. The first shard from a set (solr1a, solr1b) to successfully return is
honored, and the other requests (solr1b, if solr1a responds first, for
instance) are removed/ignored
5. The response is completed and returned as soon as one shard from each set
responds


I've written a patch to accomplish this, but have a few questions

1. What are the known disadvantages to such a strategy? (we've thought of a
few, like sets being out of sync, but they don't bother us too much)
2. What would this type of a feature be called? This way I can open a Jira
ticket for it
3. Is there a preferred way to do this? My current patch (wich I can post
soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of
the shard sets and cancel the Future<ShardResponse>'s in the corresponding
set when each response comes back.

Thanks in advance,
Mike

P.S I'd like to write a test for this feature but it wasn't clear from the
distributed test how to do so. Could somebody point me in the right
direction (an existing test, perhaps) for how to accomplish this?

Fwd: distributed search on duplicate shards

Posted by mike anderson <sa...@gmail.com>.
Just wanted to poke this since it got buried under a dozen or so Jira
updates. I also sent it to the deprecated list, though I think it should
have forwarded.

-mike

---------- Forwarded message ----------
From: mike anderson <sa...@gmail.com>
Date: Thu, Sep 23, 2010 at 7:06 PM
Subject: distributed search on duplicate shards
To: solr-dev@lucene.apache.org


Hi all,

My company is currently running a distributed Solr cluster with about 15
shards. We occasionally find that one shard will be relatively slow and thus
hold up the entire response. To remedy this we thought it might be useful to
have a system such that:

1. We can duplicate each shard, and thus have "sets" of shards, each with
the same index
2. We can pass in these sets of shards along with the query (for instance,
if "!" is the delimiter, shards=solr1a!solr1b,solr2a!solr2b)
3. The request goes out to /all/ shards (unlike load balancing in Solr
Cloud)
4. The first shard from a set (solr1a, solr1b) to successfully return is
honored, and the other requests (solr1b, if solr1a responds first, for
instance) are removed/ignored
5. The response is completed and returned as soon as one shard from each set
responds


I've written a patch to accomplish this, but have a few questions

1. What are the known disadvantages to such a strategy? (we've thought of a
few, like sets being out of sync, but they don't bother us too much)
2. What would this type of a feature be called? This way I can open a Jira
ticket for it
3. Is there a preferred way to do this? My current patch (wich I can post
soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of
the shard sets and cancel the Future<ShardResponse>'s in the corresponding
set when each response comes back.

Thanks in advance,
Mike

P.S I'd like to write a test for this feature but it wasn't clear from the
distributed test how to do so. Could somebody point me in the right
direction (an existing test, perhaps) for how to accomplish this?

Re: distributed search on duplicate shards

Posted by mike anderson <sa...@gmail.com>.
Thanks for the feedback. I ended up posting a patch to JIRA
(SOLR-2132<https://issues.apache.org/jira/browse/SOLR-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel>),
although I've made a few changes since that patch. Already from our initial
tests we've seen a 10% improvement in the 90% line for response times, which
translates to a 50% improvement in the average time.

It would be nice to know more about the current plans for SolrCloud and it's
future development road map. I've seen a few threads on here asking for more
information, but it doesn't seem like a popular subject. I'll keep an eye on
it though.

Cheers,
Mike


On Wed, Sep 29, 2010 at 2:46 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : 4. The first shard from a set (solr1a, solr1b) to successfully return is
> : honored, and the other requests (solr1b, if solr1a responds first, for
> : instance) are removed/ignored
> : 5. The response is completed and returned as soon as one shard from each
> set
> : responds
>
> It seems like a useful feature to me ... i know some folks who have
> (non Solr/Lucene based) custom search infrastructures that do roughly
> the same thing.
>
> : 1. What are the known disadvantages to such a strategy? (we've thought of
> a
> : few, like sets being out of sync, but they don't bother us too much)
>
> you wind up burning a lot of CPU, but that's not a disadvantage as much sa
> it is a trade off -- the whole point of doing something like this is that
> you'd rather burn CPU (and wasting network IO) in order to improve your
> worst case latency.
>
> : 2. What would this type of a feature be called? This way I can open a
> Jira
> : ticket for it
>
> no idea ... "redundent shard requests" comes to mind.
>
> : 3. Is there a preferred way to do this? My current patch (wich I can post
> : soon) works in the HTTPClient portion of SearchHandler. I keep a hash map
> of
> : the shard sets and cancel the Future<ShardResponse>'s in the
> corresponding
> : set when each response comes back.
>         ...
> : P.S I'd like to write a test for this feature but it wasn't clear from
> the
> : distributed test how to do so. Could somebody point me in the right
> : direction (an existing test, perhaps) for how to accomplish this?
>
> I don't relaly have a good answer for either of those questions, but the
> one thing i can suggest is thta you take a look at the SolrCloud branch
> and think about how this functionality would integrate with that (both in
> terms of implementation and in how SolrCloud unit tests work)
>
> As you mentioned: the current approach in SolrCloud is to load balance
> against identical shards on mutiple nodes in the cluster, but that's not
> contradictory with your idea: they can work in conjunction with eachother
> (ie: imagine "shard1" has four physical instances: "shard1Ax", "shard1Ay",
> "shard1Bq" and "shard1Bp" ... a request for "shard1" could trigger two
> "redundent parallel shard requests" for "shard1A" and "shard1B" and each
> of those requests could then load balance between the respecitve
> underlying physical shards.
>
>
>
> -Hoss
>
> --
> http://lucenerevolution.org/  ...  October 7-8, Boston
> http://bit.ly/stump-hoss      ...  Stump The Chump!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: distributed search on duplicate shards

Posted by Chris Hostetter <ho...@fucit.org>.
: 4. The first shard from a set (solr1a, solr1b) to successfully return is
: honored, and the other requests (solr1b, if solr1a responds first, for
: instance) are removed/ignored
: 5. The response is completed and returned as soon as one shard from each set
: responds

It seems like a useful feature to me ... i know some folks who have
(non Solr/Lucene based) custom search infrastructures that do roughly 
the same thing.

: 1. What are the known disadvantages to such a strategy? (we've thought of a
: few, like sets being out of sync, but they don't bother us too much)

you wind up burning a lot of CPU, but that's not a disadvantage as much sa 
it is a trade off -- the whole point of doing something like this is that 
you'd rather burn CPU (and wasting network IO) in order to improve your 
worst case latency.

: 2. What would this type of a feature be called? This way I can open a Jira
: ticket for it

no idea ... "redundent shard requests" comes to mind.

: 3. Is there a preferred way to do this? My current patch (wich I can post
: soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of
: the shard sets and cancel the Future<ShardResponse>'s in the corresponding
: set when each response comes back.
	...
: P.S I'd like to write a test for this feature but it wasn't clear from the
: distributed test how to do so. Could somebody point me in the right
: direction (an existing test, perhaps) for how to accomplish this?

I don't relaly have a good answer for either of those questions, but the 
one thing i can suggest is thta you take a look at the SolrCloud branch 
and think about how this functionality would integrate with that (both in 
terms of implementation and in how SolrCloud unit tests work)

As you mentioned: the current approach in SolrCloud is to load balance 
against identical shards on mutiple nodes in the cluster, but that's not 
contradictory with your idea: they can work in conjunction with eachother 
(ie: imagine "shard1" has four physical instances: "shard1Ax", "shard1Ay", 
"shard1Bq" and "shard1Bp" ... a request for "shard1" could trigger two 
"redundent parallel shard requests" for "shard1A" and "shard1B" and each 
of those requests could then load balance between the respecitve 
underlying physical shards.



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss      ...  Stump The Chump!


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org