You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Manuel Le Normand <ma...@gmail.com> on 2013/04/07 11:28:10 UTC

Slow qTime for distributed search

Hello
After performing a benchmark session on small scale i moved to a full scale
on 16 quad core servers.
Observations at small scale gave me excellent qTime (about 150 ms) with up
to 2 servers, showing my searching thread was mainly cpu bounded. My query
set is not faceted.
Growing to full scale (with same config & schema & num of docs per shard) i
sharded my collection to 48 shards and added a replication for each.
Since then i have a major performance deteriotaion, my qTime went up to 700
msec. Servers have a much smaller load, and network does not show any
difficulties. I understand that the response merging and waiting for the
slowest shard response should increase my small scale qTime, so checked
shard.info=true to observe that each shard was taking much longer, while
defining query for specific shards (shards=shard1,shard2...shard12) i get
much better results for each shard qTime and total qTime.

Keeping the same config, how come the num of shards affects the qTime of
each shard?

How can i evercome this issue?

Thanks,
Manu

Re: Slow qTime for distributed search

Posted by Furkan KAMACI <fu...@gmail.com>.

Manuel Le Normand, I am sorry but I want to learn something. You said you
have 40 dedicated servers. What is your total document count, total
document size, and total shard size?

2013/4/11 Manuel Le Normand <ma...@gmail.com>

> Hi,
> We have different working hours, sorry for the reply delay. Your assumed
> numbers are right, about 25-30Kb per doc. giving a total of 15G per shard,
> there are two shards per server (+2 slaves that should do no work
> normally).
> An average query has about 30 conditions (OR AND mixed), most of them
> textual, a small part on dateTime. They use only simple queries (no facet,
> filters etc.) as it is taken from the actual query set of my entreprise
> that works with an old search engine.
>
> As we said, if the shards in collection1 and collection2 have the same
> number of docs each (and same RAM & CPU per shard), it is apparently not a
> slow IO issue, right? So the fact of not having cached all my index doesn't
> seem the be the bottleneck.Moreover, i do store the fields but my query set
> requests only the id's and rarely snippets so I'd assume that the plenty of
> RAM i'd give the OS wouldn't make any difference as these *.fdt files don't
> need to get cached.
>
> The conclusion i get to is that the merging issue is the problem, and the
> only possibility of outsmarting it is to distribute to much fewer shards,
> meaning that i'll get back to few millions of docs per shard which are
> about linearly slower with the num of docs per shard. Though the latter
> should improve if i give much more RAM per server.
>
> I'll try tweaking a bit my schema and making better use of solr cache
> (filter query as an example), but i have something telling me the problem
> might be elsewhere. My main clue to it is that merging seems a simple CPU
> task, and tests show that even with a small amount of responses it takes a
> long time (and clearly the merging task on few docs is very short)
>
>
> On Wed, Apr 10, 2013 at 2:50 AM, Shawn Heisey <so...@elyograg.org> wrote:
>
> > On 4/9/2013 3:50 PM, Furkan KAMACI wrote:
> >
> >> Hi Shawn;
> >>
> >> You say that:
> >>
> >> *... your documents are about 50KB each.  That would translate to an
> index
> >> that's at least 25GB*
> >>
> >> I know we can not say an exact size but what is the approximately ratio
> of
> >> document size / index size according to your experiences?
> >>
> >
> > If you store the fields, that is actual size plus a small amount of
> > overhead.  Starting with Solr 4.1, stored fields are compressed.  I
> believe
> > that it uses LZ4 compression.  Some people store all fields, some people
> > store only a few or one - an ID field.  The size of stored fields does
> have
> > an impact on how much OS disk cache you need, but not as much as the
> other
> > parts of an index.
> >
> > It's been my experience that termvectors take up almost as much space as
> > stored data for the same fields, and sometimes more.  Starting with Solr
> > 4.2, termvectors are also compressed.
> >
> > Adding docValues (new in 4.2) to the schema will also make the index
> > larger.  The requirements here are similar to stored fields.  I do not
> know
> > whether this data gets compressed, but I don't think it does.
> >
> > As for the indexed data, this is where I am less clear about the storage
> > ratios, but I think you can count on it needing almost as much space as
> the
> > original data.  If the schema uses types or filters that produce a lot of
> > information, the indexed data might be larger than the original input.
> >  Examples of data explosions in a schema: trie fields with a non-zero
> > precisionStep, the edgengram filter, the shingle filter.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Slow qTime for distributed search

Posted by Manuel Le Normand <ma...@gmail.com>.

Hi,
We have different working hours, sorry for the reply delay. Your assumed
numbers are right, about 25-30Kb per doc. giving a total of 15G per shard,
there are two shards per server (+2 slaves that should do no work normally).
An average query has about 30 conditions (OR AND mixed), most of them
textual, a small part on dateTime. They use only simple queries (no facet,
filters etc.) as it is taken from the actual query set of my entreprise
that works with an old search engine.

As we said, if the shards in collection1 and collection2 have the same
number of docs each (and same RAM & CPU per shard), it is apparently not a
slow IO issue, right? So the fact of not having cached all my index doesn't
seem the be the bottleneck.Moreover, i do store the fields but my query set
requests only the id's and rarely snippets so I'd assume that the plenty of
RAM i'd give the OS wouldn't make any difference as these *.fdt files don't
need to get cached.

The conclusion i get to is that the merging issue is the problem, and the
only possibility of outsmarting it is to distribute to much fewer shards,
meaning that i'll get back to few millions of docs per shard which are
about linearly slower with the num of docs per shard. Though the latter
should improve if i give much more RAM per server.

I'll try tweaking a bit my schema and making better use of solr cache
(filter query as an example), but i have something telling me the problem
might be elsewhere. My main clue to it is that merging seems a simple CPU
task, and tests show that even with a small amount of responses it takes a
long time (and clearly the merging task on few docs is very short)

On Wed, Apr 10, 2013 at 2:50 AM, Shawn Heisey <so...@elyograg.org> wrote:

> On 4/9/2013 3:50 PM, Furkan KAMACI wrote:
>
>> Hi Shawn;
>>
>> You say that:
>>
>> *... your documents are about 50KB each.  That would translate to an index
>> that's at least 25GB*
>>
>> I know we can not say an exact size but what is the approximately ratio of
>> document size / index size according to your experiences?
>>
>
> If you store the fields, that is actual size plus a small amount of
> overhead.  Starting with Solr 4.1, stored fields are compressed.  I believe
> that it uses LZ4 compression.  Some people store all fields, some people
> store only a few or one - an ID field.  The size of stored fields does have
> an impact on how much OS disk cache you need, but not as much as the other
> parts of an index.
>
> It's been my experience that termvectors take up almost as much space as
> stored data for the same fields, and sometimes more.  Starting with Solr
> 4.2, termvectors are also compressed.
>
> Adding docValues (new in 4.2) to the schema will also make the index
> larger.  The requirements here are similar to stored fields.  I do not know
> whether this data gets compressed, but I don't think it does.
>
> As for the indexed data, this is where I am less clear about the storage
> ratios, but I think you can count on it needing almost as much space as the
> original data.  If the schema uses types or filters that produce a lot of
> information, the indexed data might be larger than the original input.
>  Examples of data explosions in a schema: trie fields with a non-zero
> precisionStep, the edgengram filter, the shingle filter.
>
> Thanks,
> Shawn
>
>

Re: Slow qTime for distributed search

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/9/2013 3:50 PM, Furkan KAMACI wrote:
> Hi Shawn;
>
> You say that:
>
> *... your documents are about 50KB each.  That would translate to an index
> that's at least 25GB*
>
> I know we can not say an exact size but what is the approximately ratio of
> document size / index size according to your experiences?

If you store the fields, that is actual size plus a small amount of 
overhead.  Starting with Solr 4.1, stored fields are compressed.  I 
believe that it uses LZ4 compression.  Some people store all fields, 
some people store only a few or one - an ID field.  The size of stored 
fields does have an impact on how much OS disk cache you need, but not 
as much as the other parts of an index.

It's been my experience that termvectors take up almost as much space as 
stored data for the same fields, and sometimes more.  Starting with Solr 
4.2, termvectors are also compressed.

Adding docValues (new in 4.2) to the schema will also make the index 
larger.  The requirements here are similar to stored fields.  I do not 
know whether this data gets compressed, but I don't think it does.

As for the indexed data, this is where I am less clear about the storage 
ratios, but I think you can count on it needing almost as much space as 
the original data.  If the schema uses types or filters that produce a 
lot of information, the indexed data might be larger than the original 
input.  Examples of data explosions in a schema: trie fields with a 
non-zero precisionStep, the edgengram filter, the shingle filter.

Thanks,
Shawn

Re: Slow qTime for distributed search

Posted by Furkan KAMACI <fu...@gmail.com>.

Hi Shawn;

You say that:

*... your documents are about 50KB each.  That would translate to an index
that's at least 25GB*

I know we can not say an exact size but what is the approximately ratio of
document size / index size according to your experiences?


2013/4/9 Shawn Heisey <so...@elyograg.org>

> On 4/9/2013 2:10 PM, Manuel Le Normand wrote:
>
>> Thanks for replying.
>> My config:
>>
>>     - 40 dedicated servers, dual-core each
>>     - Running Tomcat servlet on Linux
>>     - 12 Gb RAM per server, splitted half between OS and Solr
>>     - Complex queries (up to 30 conditions on different fields), 1 qps
>> rate
>>
>> Sharding my index was done for two reasons, based on 2 servers (4shards)
>> tests:
>>
>>     1. As index grew above few million of docs qTime raised greatly, while
>>     sharding the index to smaller pieces (about 0.5M docs) gave way better
>>     results, so I bound every shard to have 0.5M docs.
>>     2. Tests showed i was cpu-bounded during queries. As i have low qps
>> rate
>>     (emphasize: lower than expected qTime) and as a query runs
>> single-threaded
>>     on each shard, it made sense to accord a cpu to each shard.
>>
>> For the same amount of docs per shards I do expect a raise in total qTime
>> for the reasons:
>>
>>     1. The response should wait for the slowest shard
>>     2. Merging the responses from 40 different shards takes time
>>
>> What i understand from your explanation is that it's the merging that
>> takes
>> time and as qTime ends only after the second retrieval phase, the qTime on
>> each shard will take longer. Meaning during a significant proportion of
>> the
>> first query phase (right after the [id,score] are retieved), all cpu's are
>> idle except the response-merger thread running on a single cpu. I thought
>> of the merge as a simple sorting of [id,score], way more simple than
>> additional 300 ms cpu time.
>>
>> Why would a RAM increase improve my performances, as it's a
>> "response-merge" (CPU resource) bottleneck?
>>
>
> If you have not tweaked the Tomcat configuration, that can lead to
> problems, but if your total query volume is really only one query per
> second, this is probably not a worry for you.  A tomcat connector can be
> configured with a maxThreads parameter.  The recommended value there is
> 10000, but Tomcat defaults to 200.
>
> You didn't include the index sizes.  There's half a million docs per
> shard, but I don't know what that translates to in terms of MB or GB of
> disk space.
>
> On another email thread you mention that your documents are about 50KB
> each.  That would translate to an index that's at least 25GB, possibly
> more.  That email thread also says that optimization for you takes an hour,
> further indications that you've got some really big indexes.
>
> You're saying that you have given 6GB out of the 12GB to Solr, leaving
> only 6GB for the OS and caching.  Ideally you want to have enough RAM to
> cache the entire index, but in reality you can usually get away with
> caching between half and two thirds of the index.  Exactly what ratio works
> best is highly dependent on your schema.
>
> If my numbers are even close to right, then you've got a lot more index on
> each server than available RAM.  Based on what I can deduce, you would want
> 24 to 48GB of RAM per server.  If my numbers are wrong, then this estimate
> is wrong.
>
> I would be interested in seeing your queries.  If the complexity can be
> expressed as filter queries that get re-used a lot, the filter cache can be
> a major boost to performance.  Solr's caches in general can make a big
> difference.  There is no guarantee that caches will help, of course.
>
> Thanks,
> Shawn
>
>

Re: Slow qTime for distributed search

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/9/2013 2:10 PM, Manuel Le Normand wrote:
> Thanks for replying.
> My config:
>
>     - 40 dedicated servers, dual-core each
>     - Running Tomcat servlet on Linux
>     - 12 Gb RAM per server, splitted half between OS and Solr
>     - Complex queries (up to 30 conditions on different fields), 1 qps rate
>
> Sharding my index was done for two reasons, based on 2 servers (4shards)
> tests:
>
>     1. As index grew above few million of docs qTime raised greatly, while
>     sharding the index to smaller pieces (about 0.5M docs) gave way better
>     results, so I bound every shard to have 0.5M docs.
>     2. Tests showed i was cpu-bounded during queries. As i have low qps rate
>     (emphasize: lower than expected qTime) and as a query runs single-threaded
>     on each shard, it made sense to accord a cpu to each shard.
>
> For the same amount of docs per shards I do expect a raise in total qTime
> for the reasons:
>
>     1. The response should wait for the slowest shard
>     2. Merging the responses from 40 different shards takes time
>
> What i understand from your explanation is that it's the merging that takes
> time and as qTime ends only after the second retrieval phase, the qTime on
> each shard will take longer. Meaning during a significant proportion of the
> first query phase (right after the [id,score] are retieved), all cpu's are
> idle except the response-merger thread running on a single cpu. I thought
> of the merge as a simple sorting of [id,score], way more simple than
> additional 300 ms cpu time.
>
> Why would a RAM increase improve my performances, as it's a
> "response-merge" (CPU resource) bottleneck?

If you have not tweaked the Tomcat configuration, that can lead to 
problems, but if your total query volume is really only one query per 
second, this is probably not a worry for you.  A tomcat connector can be 
configured with a maxThreads parameter.  The recommended value there is 
10000, but Tomcat defaults to 200.

You didn't include the index sizes.  There's half a million docs per 
shard, but I don't know what that translates to in terms of MB or GB of 
disk space.

On another email thread you mention that your documents are about 50KB 
each.  That would translate to an index that's at least 25GB, possibly 
more.  That email thread also says that optimization for you takes an 
hour, further indications that you've got some really big indexes.

You're saying that you have given 6GB out of the 12GB to Solr, leaving 
only 6GB for the OS and caching.  Ideally you want to have enough RAM to 
cache the entire index, but in reality you can usually get away with 
caching between half and two thirds of the index.  Exactly what ratio 
works best is highly dependent on your schema.

If my numbers are even close to right, then you've got a lot more index 
on each server than available RAM.  Based on what I can deduce, you 
would want 24 to 48GB of RAM per server.  If my numbers are wrong, then 
this estimate is wrong.

I would be interested in seeing your queries.  If the complexity can be 
expressed as filter queries that get re-used a lot, the filter cache can 
be a major boost to performance.  Solr's caches in general can make a 
big difference.  There is no guarantee that caches will help, of course.

Thanks,
Shawn

Re: Slow qTime for distributed search

Posted by Manuel Le Normand <ma...@gmail.com>.

Thanks for replying.
My config:

   - 40 dedicated servers, dual-core each
   - Running Tomcat servlet on Linux
   - 12 Gb RAM per server, splitted half between OS and Solr
   - Complex queries (up to 30 conditions on different fields), 1 qps rate

Sharding my index was done for two reasons, based on 2 servers (4shards)
tests:

   1. As index grew above few million of docs qTime raised greatly, while
   sharding the index to smaller pieces (about 0.5M docs) gave way better
   results, so I bound every shard to have 0.5M docs.
   2. Tests showed i was cpu-bounded during queries. As i have low qps rate
   (emphasize: lower than expected qTime) and as a query runs single-threaded
   on each shard, it made sense to accord a cpu to each shard.

For the same amount of docs per shards I do expect a raise in total qTime
for the reasons:

   1. The response should wait for the slowest shard
   2. Merging the responses from 40 different shards takes time

What i understand from your explanation is that it's the merging that takes
time and as qTime ends only after the second retrieval phase, the qTime on
each shard will take longer. Meaning during a significant proportion of the
first query phase (right after the [id,score] are retieved), all cpu's are
idle except the response-merger thread running on a single cpu. I thought
of the merge as a simple sorting of [id,score], way more simple than
additional 300 ms cpu time.

Why would a RAM increase improve my performances, as it's a
"response-merge" (CPU resource) bottleneck?

Thanks in advance,
Manu


On Mon, Apr 8, 2013 at 10:19 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 4/8/2013 12:19 PM, Manuel Le Normand wrote:
>
>> It seems that sharding my collection to many shards slowed down
>> unreasonably, and I'm trying to investigate why.
>>
>> First, I created "collection1" - 4 shards*replicationFactor=1 collection
>> on
>> 2 servers. Second I created "collection2" - 48 shards*replicationFactor=2
>> collection on 24 servers, keeping same config and same num of documents
>> per
>> shard.
>>
>
> The primary reason to use shards is for index size, when your index is so
> big that a single index cannot give you reasonable performance. There are
> also sometimes performance gains when you break a smaller index into
> shards, but there is a limit.
>
> Going from 2 shards to 3 shards will have more of an impact that going
> from 8 shards to 9 shards.  At some point, adding shards makes things
> slower, not faster, because of the extra work required for combining
> multiple queries into one result response.  There is no reasonable way to
> predict when that will happen.
>
>  Observations showed the following:
>>
>>     1. Total qTime for the same query set is 5 time higher in collection2
>>     (150ms->700 ms)
>>     2. Adding to colleciton2 the *shard.info=true* param in the query
>> shows
>>
>>     that each shard is much slower than each shard was in collection1
>> (about 4
>>     times slower)
>>     3.  Querying only specific shards on collection2 (by adding the
>>
>>     shards=shard1,shard2...shard12 param) gave me much better qTime per
>> shard
>>     (only 2 times higher than in collection1)
>>     4. I have a low qps rate, thus i don't suspect the replication factor
>>
>>     for being the major cause of this.
>>     5. The avg. cpu load on servers during querying was much higher in
>>
>>     collection1 than in collection2 and i didn't catch any other
>> bottlekneck.
>>
>
> A distributed query actually consists of up to two queries per shard. The
> first query just requests the uniqueKey field, not the entire document.  If
> you are sorting the results, then the sort field(s) are also requested,
> otherwise the only additional information requested is the relevance score.
>  The results are compiled into a set of unique keys, then a second query is
> sent to the proper shards requesting specific documents.
>
>
>  Q:
>> 1. Why does the amount of shards affect the qTime of each shard?
>> 2. How can I overcome to reduce back the qTime of each shard?
>>
>
> With more shards, it takes longer for the first phase to compile the
> results, so the second phase (document retrieval) gets delayed, and the
> QTime goes up.
>
> One way to reduce the total time is to reduce the number of shards.
>
> You haven't said anything about how complex your queries are, your index
> size(s), or how much RAM you have on each server and how it is allocated.
>  Can you provide this information?
>
> Getting good performance out of Solr requires plenty of RAM in your OS
> disk cache.  Query times of 150 to 700 milliseconds seem very high, which
> could be due to query complexity or a lack of server resources (especially
> RAM), or possibly both.
>
> Thanks,
> Shawn
>
>

Re: Slow qTime for distributed search

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/8/2013 12:19 PM, Manuel Le Normand wrote:
> It seems that sharding my collection to many shards slowed down
> unreasonably, and I'm trying to investigate why.
>
> First, I created "collection1" - 4 shards*replicationFactor=1 collection on
> 2 servers. Second I created "collection2" - 48 shards*replicationFactor=2
> collection on 24 servers, keeping same config and same num of documents per
> shard.

The primary reason to use shards is for index size, when your index is 
so big that a single index cannot give you reasonable performance. 
There are also sometimes performance gains when you break a smaller 
index into shards, but there is a limit.

Going from 2 shards to 3 shards will have more of an impact that going 
from 8 shards to 9 shards.  At some point, adding shards makes things 
slower, not faster, because of the extra work required for combining 
multiple queries into one result response.  There is no reasonable way 
to predict when that will happen.

> Observations showed the following:
>
>     1. Total qTime for the same query set is 5 time higher in collection2
>     (150ms->700 ms)
>     2. Adding to colleciton2 the *shard.info=true* param in the query shows
>     that each shard is much slower than each shard was in collection1 (about 4
>     times slower)
>     3.  Querying only specific shards on collection2 (by adding the
>     shards=shard1,shard2...shard12 param) gave me much better qTime per shard
>     (only 2 times higher than in collection1)
>     4. I have a low qps rate, thus i don't suspect the replication factor
>     for being the major cause of this.
>     5. The avg. cpu load on servers during querying was much higher in
>     collection1 than in collection2 and i didn't catch any other bottlekneck.

A distributed query actually consists of up to two queries per shard. 
The first query just requests the uniqueKey field, not the entire 
document.  If you are sorting the results, then the sort field(s) are 
also requested, otherwise the only additional information requested is 
the relevance score.  The results are compiled into a set of unique 
keys, then a second query is sent to the proper shards requesting 
specific documents.

> Q:
> 1. Why does the amount of shards affect the qTime of each shard?
> 2. How can I overcome to reduce back the qTime of each shard?

With more shards, it takes longer for the first phase to compile the 
results, so the second phase (document retrieval) gets delayed, and the 
QTime goes up.

One way to reduce the total time is to reduce the number of shards.

You haven't said anything about how complex your queries are, your index 
size(s), or how much RAM you have on each server and how it is 
allocated.  Can you provide this information?

Getting good performance out of Solr requires plenty of RAM in your OS 
disk cache.  Query times of 150 to 700 milliseconds seem very high, 
which could be due to query complexity or a lack of server resources 
(especially RAM), or possibly both.

Thanks,
Shawn

Re: Slow qTime for distributed search

Posted by Manuel Le Normand <ma...@gmail.com>.

After taking a look on what I'd wrote earlier, I will try to rephrase in a
clear manner.

It seems that sharding my collection to many shards slowed down
unreasonably, and I'm trying to investigate why.

First, I created "collection1" - 4 shards*replicationFactor=1 collection on
2 servers. Second I created "collection2" - 48 shards*replicationFactor=2
collection on 24 servers, keeping same config and same num of documents per
shard.
Observations showed the following:

   1. Total qTime for the same query set is 5 time higher in collection2
   (150ms->700 ms)
   2. Adding to colleciton2 the *shard.info=true* param in the query shows
   that each shard is much slower than each shard was in collection1 (about 4
   times slower)
   3.  Querying only specific shards on collection2 (by adding the
   shards=shard1,shard2...shard12 param) gave me much better qTime per shard
   (only 2 times higher than in collection1)
   4. I have a low qps rate, thus i don't suspect the replication factor
   for being the major cause of this.
   5. The avg. cpu load on servers during querying was much higher in
   collection1 than in collection2 and i didn't catch any other bottlekneck.

Q:
1. Why does the amount of shards affect the qTime of each shard?
2. How can I overcome to reduce back the qTime of each shard?

Thanks,
Manu