You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Stephen Lewis <sd...@gmail.com> on 2018/03/02 18:43:35 UTC

Performance Implications of Different Routing Schemes

Hello!

I'm wondering what information you may be able to provide on performance
implications of implicit routing VS composite ID routing. In particular,
I'm curious what the horizontal scaling behavior may be of implicit routing
or composite ID routing with and without the "/<bits>" param appended on.

I've been following this documentation
<https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html>,
and a few other blogs/articles I've seen around the web (including on lucid
works). Many of these discuss what the techniques and general philosophy
are of different document routing techniques, but I haven't been able to
find a "Big O" assessment so far by searching online. I'm aware than any
particular workload really needs a Sizing Exercise
<https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/>
to fully understand its implications, but I'm hoping to plan high level
architecture beyond what I can currently forsee in scale.

A relatively simple assessment I've done belowleads me to believe the
following is likely the case: if we have S shards and B as our "/bits"
param, then resource usage would Big O scale as follows (note: Previously
I've received the advice that any shard should be capped at a max of 120M
documents, which is where the cap on docs/shard-key comes from)

   - Implicit routing
      - One Read: O(S)
         - hits horizontal scaling limit eventually as S grows
         - No cap on docs per shard key (no shard key)
      - Composite ID routing, no bits param:
      - One Read: O(1)
         - no horizontal scaling limit as S grows
         - Docs on a shard key capped at 120 million
      - Composite ID routing with bits param:
      - One Read: O(2^B)
         - no horizontal scaling limit as S grows for fixed B
         - Docs on a shard key capped at 120 million * 2^B


So my questions: Is this "big O" analysis about correct? Does SOLR have an
ability to scale horizontally on implicit routing despite what my simple
analysis would suggest? Are there other considerations here you can
enlighten me on?

I would guess the answer to the second question is "no" because otherwise
it wouldn't seem to me that composite ID routing would add much concrete
value. But perhaps there are some other factors I've yet to consider.

Thanks for your time and help! Looking forward to hearing back :)

Cheers,
Stephen

Re: Performance Implications of Different Routing Schemes

Posted by Stephen Lewis <ia...@stephen-lewis.net>.

Hi!

Thank you for your thoughtful response Shawn! I have a couple of follow up
questions and points to check my understanding on.

Thanks for explaining my misunderstanding on implicit routing. So to repeat
back and check my understanding: implicit routing may be either left up to
SOLR to distribute, or you may specify the router.field parameter at
collection creation time. If there is no router.field parameter specified,
SOLR will distribute documents to shards based solely on a hash of the
docID; if router.field is defined, SOLR will distribute documents to shards
based solely on the hash of the router.field value on the doc. Correct?

So let me focus a bit more on the composite ID router. The options
available here are:

   - single prepend routing (tenant1!id1)
   - multi prepend routing (tenant1!piece1!id1)
   - num-bits prepend routing (tenant1/B!id1)

I think the first two are relatively straight forward; the ask is on the
application layer to supply one or two prepends, and then SOLR will find an
appropriate shard to host the document based on a hash of the prepend(s).

I'm very interested though by the num-bits prepend. (By the way, I never
found an agreed-upon name for this, so let me know if there is something
standard I should call this). Originally when I wrote I had a
misunderstanding here, but I believe I've understood it full now. If B is
your "bits" param, then a tenant will be spread out over 1/(2^B) fraction
of the shards; so if B = 0, any shard may end up hosting the doc; if B = 1,
half of the shards may be the one to host the doc; if B = 2, one quarter of
the shards may be the one to host the doc; etc....

I was still a bit uncertain about the mechanism until looked deeper into this
documentation
<https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html>
and this article
<https://lucidworks.com/2013/06/13/solr-cloud-document-routing/>: "[The
composite ID router] creates a composite 32 bit hash by taking 16 bits from
the shard key’s hash and 16 bits from the document id’s hash.... When a
tenant is too large to fit on a single shard it can be spread across
multiple shards be specifying the number of bits to use from the shard key.
The syntax for this is: shard_key/num!document_id. The /num is the number
of bits from the shard key to use in the composite hash." So I think this
mean that first the hashes of each are computed, and then the bits are
taken from the resultant hash values. Is that correct? So I believe that
means that if num = 16, then that would be the same as omitting the /num
param. I believe it also implies that if the number of shards is not a
power of 2, some irregularity in number of shards will be experienced;
e.g., if there is a collection with 11 shards and num = 2; 11/2^2 = 2.75;
then every tenant will live on at least 3 shards, some may end up living on
4 depending on exactly how the ranges work out.

Some of what implicit routing offers could be quite desirable if the
document shard-routing ever needs to be updated. Specifically, the ID of
the document could remain constant even when making updates to the routing
key, which I believe would allow in-place updates to a document which
changes its host shard. So for example, if I create a collection using
implicit routing and router.field=shard_key, then I can insert a document
with id=1 and shard_key=1, then later insert a document with id=1 and
shard_key=2, and the original document sent with shard_key = 1 will be
automatically deleted on insert of the new document. Can you confirm
whether this is true? Or would the original document with id = 1 and
shard_key = 1 not necessarily be deleted?

The only drawback then I see of using implicit routing would be that there
is no equivalent to the num-bits prepend: if you want to spread out your
documents associated with a given tenant across fewer than all but more
than one shard, it falls back to the developer to update the shard key. Is
this correct, or is there an equivalent notion to num-bits for implicit
routing?

There's one other thing I want to drill down on a bit more with resource
usage:

>
>
> *Query performance should be about the same for any routing type.  It does
> look like when you use compositeId and actually implement shard keys, you
> can limit queries to those shards, but a *general* query is going to hit
> all shards.*
>

Currently I have experience targeting 1-shard at a time with queries. My
goal in architecture here is to be a little more flexible, and instead keep
the number of shards a given query has to hit approximately constant even
as the user base and solr cloud grow. I believe that will keep CPU sprawl
at a minimum, more on that below.

>
>
>
>
>
> * If your query rate is very low (or shards are distributed across a lot
> of hardware that has significant spare CPU capacity) performance isn't
> going to be dramatically different for a query that hits 2 shards versus
> one that hits 6 shards.  If your query rate is high, then more shards
> probably will be noticeably slower than fewer shards. *
>

Generally I want to plan for a relatively high query rate, but my
experience shows that the SOLR clouds I've built before have very very low
CPU (spikes go to about 15%, steady state much lower) so starting with
assuming there is head room at 1-shard-per-query is a very reasonable
assumption for me right now. I guess what I'm asking; let's say the number
of shards hosting a query goes from 1 to 2 for every shard_key, tenant, and
query. Will the CPU double? How about 2 to 4; should I expect it to double
again? My expectation is "yes", but I wanted to see what the SOLR community
thinks here.

Thanks as well for explaining more on the vertical cap of a shard: I got
the 120M approximate guidance and found it to be shockingly accurate on a
couple of production clouds I worked on, so I may have over generalized a
bit :)

One other question I have; what is the best way to route queries? I've seen
both the _route_ param, as well as the shard.keys param. Is there a reason
to choose one over the other?

Thanks again so much for your thoughtful response! Looking forward to
hearing back.

Cheers,
Stephen

On Fri, Mar 2, 2018 at 4:51 PM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 3/2/2018 11:43 AM, Stephen Lewis wrote:
> > I'm wondering what information you may be able to provide on performance
> > implications of implicit routing VS composite ID routing. In particular,
> > I'm curious what the horizontal scaling behavior may be of implicit
> routing
> > or composite ID routing with and without the "/<bits>" param appended on.
>
> The hash calculations should probably introduce so little overhead that
> you'd never notice it.
>
> I once implemented a hash algorithm using two hash classes built into
> Java.  I'm pretty sure that it was NOT a fast implementation ... and it
> could calculate over a million hashes per second on input strings of
> about 20 characters.
>
> The hash algorithm used by CompositeId (one of the MurmurHash
> implementations) is supposed to be one of the fastest algorithms in the
> world.  Unless your uniqueId field values are extremely huge, I really
> doubt that hash calculation is a significant source of overhead.
>
> The use of implicit doesn't automatically mean there's no overhead for
> routing.  The implicit router can still redirect documents to different
> shards, it just does it explicitly, usually with a shard name in a
> particular field, rather than by hash calculation.
>
> > A relatively simple assessment I've done belowleads me to believe the
> > following is likely the case: if we have S shards and B as our "/bits"
> > param, then resource usage would Big O scale as follows (note: Previously
> > I've received the advice that any shard should be capped at a max of 120M
> > documents, which is where the cap on docs/shard-key comes from)
>
> Query performance should be about the same for any routing type.  It
> does look like when you use compositeId and actually implement shard
> keys, you can limit queries to those shards, but a *general* query is
> going to hit all shards.
>
> If your query rate is very low (or shards are distributed across a lot
> of hardware that has significant spare CPU capacity) performance isn't
> going to be dramatically different for a query that hits 2 shards versus
> one that hits 6 shards.  If your query rate is high, then more shards
> probably will be noticeably slower than fewer shards.
>
> For the maximum docs to allow per shard:  It depends.  For some indexes
> and use cases, a million documents per shard might be way too big.  For
> others, 500 million per shard might have incredible performance.  There
> are no hard rules about this.  It's entirely dependent on what you're
> actually doing.
>
> Thanks,
> Shawn
>
>

-- 
www.stephen-lewis.net

Re: Performance Implications of Different Routing Schemes

Posted by Shawn Heisey <ap...@elyograg.org>.

On 3/2/2018 11:43 AM, Stephen Lewis wrote:
> I'm wondering what information you may be able to provide on performance
> implications of implicit routing VS composite ID routing. In particular,
> I'm curious what the horizontal scaling behavior may be of implicit routing
> or composite ID routing with and without the "/<bits>" param appended on.

The hash calculations should probably introduce so little overhead that
you'd never notice it.

I once implemented a hash algorithm using two hash classes built into
Java.  I'm pretty sure that it was NOT a fast implementation ... and it
could calculate over a million hashes per second on input strings of
about 20 characters.

The hash algorithm used by CompositeId (one of the MurmurHash
implementations) is supposed to be one of the fastest algorithms in the
world.  Unless your uniqueId field values are extremely huge, I really
doubt that hash calculation is a significant source of overhead.

The use of implicit doesn't automatically mean there's no overhead for
routing.  The implicit router can still redirect documents to different
shards, it just does it explicitly, usually with a shard name in a
particular field, rather than by hash calculation.

> A relatively simple assessment I've done belowleads me to believe the
> following is likely the case: if we have S shards and B as our "/bits"
> param, then resource usage would Big O scale as follows (note: Previously
> I've received the advice that any shard should be capped at a max of 120M
> documents, which is where the cap on docs/shard-key comes from)

Query performance should be about the same for any routing type.  It
does look like when you use compositeId and actually implement shard
keys, you can limit queries to those shards, but a *general* query is
going to hit all shards.

If your query rate is very low (or shards are distributed across a lot
of hardware that has significant spare CPU capacity) performance isn't
going to be dramatically different for a query that hits 2 shards versus
one that hits 6 shards.  If your query rate is high, then more shards
probably will be noticeably slower than fewer shards.

For the maximum docs to allow per shard:  It depends.  For some indexes
and use cases, a million documents per shard might be way too big.  For
others, 500 million per shard might have incredible performance.  There
are no hard rules about this.  It's entirely dependent on what you're
actually doing.

Thanks,
Shawn