You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by feroz_kh <fe...@yahoo.com> on 2012/08/13 20:17:59 UTC

Solr Index linear growth - Performance degradation.

We have 4 shards with 14GB index on each of them
Each shard has a master and 3 slaves(each of them with 32GB RAM)

We're expecting that the index size will grow to double or triple in near
future.
So we thought of merging our indexes to 28GB index so that each shard has
28GB index and also increased our RAM on each slave to 48GB.

We made this changes locally and tested the server by sending same 10K
realistic queries to each server with 14GB & 28GB index, we found that
1. For server with 14GB index(48GB RAM): search time was 480ms, number of
index hits: 3.8GB
2. For server with 28GB index(48GB RAM): search time was 900ms, number of
index hits: 7.2GB.

So we saw that having the whole index in RAM doesn't help in sustaining the
performance in terms of search time . Search time increased linearly to
double when the index size was doubled.

We were thinking of keeping only 4 shards configuration but it looks like
now we have to add another shard or another slave to each shard.

Is there way we can configure our servers so that the performance isn't
affected even when index size doubles or triples ?







--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@gmail.com>.
The queries were extracted from production log.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001182.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@gmail.com>.
These are simple search queries and Its multithreaded .



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001184.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by Alexey Serba <as...@gmail.com>.
>10K queries
How do you generate these queries? I.e. is this a single or multi
threaded application?

Can you provide full queries you send to Solr servers and solrconfig
request handler configuration? Do you use function queries, grouping,
faceting, etc?


On Tue, Aug 14, 2012 at 10:31 AM, feroz_kh <fe...@gmail.com> wrote:
> Its 7,200,000 hits == number of documents found by all 10K queries.
> We have RHEL tikanga version.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001069.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@gmail.com>.
Its 7,200,000 hits == number of documents found by all 10K queries.
We have RHEL tikanga version.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001069.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@gmail.com>.
index hits == total number of documents found by search query.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001063.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@gmail.com>.
Its 7.2Gig Hits. (GB was typo)
This is the total number of index hits - calculated by summing each
"numFound" attribute from solr query response.
We have RHEL Tikanga version.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001061.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by Lance Norskog <go...@gmail.com>.
How many documents does each search find? What does this mean: "number
of index hits: 7.2GB."

Above a threshold, the more memory you give Java, the more time it
spends collecting. You want to start with very little memory and
gradually increase memory size until the program stops using it all,
and then add maybe 10%. The operating system is better at managing
memory than Java, and it is faster to leave the full index data in the
OS disk buffers. It is counterintuitive, but is true.

Another problem you will find is 'Large Pages'. This is an OS tuning
parameter, not a Java or Solr tuning. You did not say which OS you
use, but here is an explanation for Linux:
http://lwn.net/Articles/423584/

On Mon, Aug 13, 2012 at 6:16 PM, feroz_kh <fe...@yahoo.com> wrote:
> 1. So we have 24.5GB assigned to jvm which is half of the total memory, which
> is 48GB RAM.(If that's what you meant, and if i am getting that right ?)
> 2. Size of *.fdt and *fdx is around 300m and 50m respectively.So that's
> definitely less that 5%.
> Do you see a problem there ?
>
> Is there a way that we can force or tune in such a way that the response
> time remains constant or doesn't degrade a lot(i.e. almost doubling) when
> the index size is doubled ?
> Or we cannot do anything about it ?
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001034.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goksron@gmail.com

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@gmail.com>.
It looks like reducing the jvm heap allocation did help in lowering the
response time to some extent.
Thanks for the pointer.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001056.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@yahoo.com>.
1. So we have 24.5GB assigned to jvm which is half of the total memory, which
is 48GB RAM.(If that's what you meant, and if i am getting that right ?)
2. Size of *.fdt and *fdx is around 300m and 50m respectively.So that's
definitely less that 5%.
Do you see a problem there ?

Is there a way that we can force or tune in such a way that the response
time remains constant or doesn't degrade a lot(i.e. almost doubling) when
the index size is doubled ?
Or we cannot do anything about it ?



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001034.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by Erick Erickson <er...@gmail.com>.
Instant reactions:

1> that's probably too much memory. Try, as Lance said, 1/2 of your
memory. Uwe Schindler wrote an excellent blog about this issue as it
relates to MMapDirectory
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

2> You've doubled the number of docs on the server and you're seeing a
doubling of the response time, right? On average, the number of
documents that have to be scored has also doubled, so I'm not entirely
surprised. If everything is in memory (which it sounds like it may
well be) then this isn't particularly surprising.

3> One note of caution. Saying "a 14 GB index (or 28G index) isn't
very meaningful. The *.fdt and *.fdx files in your index directory are
where the verbatim copy of the data is stored for those fields where
stored="true" in your schema. The contents of these files are almost
totally irrelevant to the memory requirements for searching. I've seen
these files range form < 5% of the index to over 80%.

Best
Erick

On Mon, Aug 13, 2012 at 4:40 PM, feroz_kh <fe...@yahoo.com> wrote:
> Here's few list of queries
> -------------------------------------------
> parallel zur xml beschreibungsdatei gibt es eine
> die verbindung zwischen beiden sei ten geschieht
> die owner klasse muss sich aus der
> benutzer ein oder mehrere lieblingsfarben ausw hlen kann
> found sample questions at http bjs ojp
> but more important parents need to keep
> -----------------------------------
> Here's the jvm ram assignment
> -Xms24576m -Xmx24576m -XX:NewSize=6168m -XX:MaxNewSize=6168m
> -XX:MaxPermSize=1024m
> I believe that's enough assigned there...
> ---------------------------------
> I am not dealing with adding new documents here....
> Just testing the solr index search - i just the have the indexes.
> For 14GB index the RAM cache gets filled with 14 GB around
> For 28GB index the RAM cache gets filled with 28GB around
> The Document cache size is 200MB max and initial 20MB.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001011.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by feroz_kh <fe...@yahoo.com>.
Here's few list of queries
-------------------------------------------
parallel zur xml beschreibungsdatei gibt es eine
die verbindung zwischen beiden sei ten geschieht
die owner klasse muss sich aus der
benutzer ein oder mehrere lieblingsfarben ausw hlen kann
found sample questions at http bjs ojp
but more important parents need to keep
-----------------------------------
Here's the jvm ram assignment
-Xms24576m -Xmx24576m -XX:NewSize=6168m -XX:MaxNewSize=6168m
-XX:MaxPermSize=1024m
I believe that's enough assigned there...
---------------------------------
I am not dealing with adding new documents here....
Just testing the solr index search - i just the have the indexes.
For 14GB index the RAM cache gets filled with 14 GB around
For 28GB index the RAM cache gets filled with 28GB around
The Document cache size is 200MB max and initial 20MB.



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934p4001011.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Index linear growth - Performance degradation.

Posted by Lance Norskog <go...@gmail.com>.
How much ram do you assign to the JVM? The JVM should be allocated
maybe 1/2 gb more than it needs to run "comfortably". Also, how large
are your caches?

How large are the documents? How many search terms are there? If you
add more documents are there new search terms?

On Mon, Aug 13, 2012 at 11:17 AM, feroz_kh <fe...@yahoo.com> wrote:
> We have 4 shards with 14GB index on each of them
> Each shard has a master and 3 slaves(each of them with 32GB RAM)
>
> We're expecting that the index size will grow to double or triple in near
> future.
> So we thought of merging our indexes to 28GB index so that each shard has
> 28GB index and also increased our RAM on each slave to 48GB.
>
> We made this changes locally and tested the server by sending same 10K
> realistic queries to each server with 14GB & 28GB index, we found that
> 1. For server with 14GB index(48GB RAM): search time was 480ms, number of
> index hits: 3.8GB
> 2. For server with 28GB index(48GB RAM): search time was 900ms, number of
> index hits: 7.2GB.
>
> So we saw that having the whole index in RAM doesn't help in sustaining the
> performance in terms of search time . Search time increased linearly to
> double when the index size was doubled.
>
> We were thinking of keeping only 4 shards configuration but it looks like
> now we have to add another shard or another slave to each shard.
>
> Is there way we can configure our servers so that the performance isn't
> affected even when index size doubles or triples ?
>
>
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-linear-growth-Performance-degradation-tp4000934.html
> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goksron@gmail.com