You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Bictor Man <bi...@gmail.com> on 2011/09/26 15:53:35 UTC

drastic performance decrease with 20 cores

Hi everyone,

Sorry if this issue has been discussed before, but I'm new to the list.

I have a solr (3.4) instance running with 20 cores (around 4 million docs
each).
The instance has allocated 13GB in a 16GB RAM server. If I run several sets
of queries sequentially in each of the cores, the I/O access goes very high,
so does the system load, while the CPU percentage remains always low.
It takes almost 1 hour to complete the set of queries.

If I stop solr and restart it with 6GB allocated and 10 cores, after a bit
the I/O access goes down and the CPU goes up, taking only around 5 minutes
to complete all sets of queries.

Meaning that for me is MUCH more performant having 2 solr instances running
with half the data and half the memory than a single instance will all the
data and memory.

It would be even way faster to have 1 instance with half the cores/memory,
run the queues, shut it down, start a new instance and repeat the process
than having a big instance running everything.

Furthermore, if I take the 20cores/13GB instance, unload 10 of the cores,
trigger the garbage collector and run the sets of queries again, the
behavior still remains slow taking like 30 minutes.

am I missing something here? does solr change its caching policy depending
on the number of cores at startup or something similar?

Any hints will be very appreciated.

Thanks,
Victor

RE: how to implemente a query like " like '%pattern%' "

Posted by "Rode González (libnova)" <ro...@libnova.es>.

Hola Tomás.

it seems that yes, using q = "word1 word2" over a tokenized field, it seems to
work. I will do some additional testing.

thanks a lot,

rode.


> -----Mensaje original-----
> De: Tomás Fernández Löbbe [mailto:tomasflobbe@gmail.com]
> Enviado el: lunes, 26 de septiembre de 2011 22:12
> Para: solr-user@lucene.apache.org
> Asunto: Re: how to implemente a query like " like '%pattern%' "
> 
> If you need those kinds of searches then you should probably not be using
> the KeywordTokenizerFactory, is there any reason why you can't switch to a
> WhitespaceTokenizer for example? then you could use a simple phrase query
> for your search case. if you need everything as a Token, you could use a
> copyfield and duplicate the field and have them both.
> 
> Are those acceptable options for you?
> 
> Tomás
> 
> 2011/9/26 Rode González (libnova) <ro...@libnova.es>
> 
> > Hi all.
> >
> > how can we do a query similar to 'like' ?
> >
> >
> > if I have this phrase like a single token in the index: "This phrase has
> > various words" (using KeywordTokenizerFactory)
> > and i like a exact match of:  "phrase has various" or "various words"
> form
> > instance...
> >
> > How can i do this??
> >
> > Thanks a lot.
> >
> > Rode.
> >
> >
> > -----
> > No se encontraron virus en este mensaje.
> > Comprobado por AVG - www.avg.com
> > Versión: 10.0.1410 / Base de datos de virus: 1520/3920 - Fecha de
> > publicación: 09/26/11
> >
> >
> >
> 
> -----
> No se encontraron virus en este mensaje.
> Comprobado por AVG - www.avg.com
> Versión: 10.0.1410 / Base de datos de virus: 1520/3920 - Fecha de
> publicación: 09/26/11

-----
No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 10.0.1410 / Base de datos de virus: 1520/3921 - Fecha de publicación:
09/26/11

Re: how to implemente a query like " like '%pattern%' "

Posted by Tomás Fernández Löbbe <to...@gmail.com>.

If you need those kinds of searches then you should probably not be using
the KeywordTokenizerFactory, is there any reason why you can't switch to a
WhitespaceTokenizer for example? then you could use a simple phrase query
for your search case. if you need everything as a Token, you could use a
copyfield and duplicate the field and have them both.

Are those acceptable options for you?

Tomás

2011/9/26 Rode González (libnova) <ro...@libnova.es>

> Hi all.
>
> how can we do a query similar to 'like' ?
>
>
> if I have this phrase like a single token in the index: "This phrase has
> various words" (using KeywordTokenizerFactory)
> and i like a exact match of:  "phrase has various" or "various words" form
> instance...
>
> How can i do this??
>
> Thanks a lot.
>
> Rode.
>
>
> -----
> No se encontraron virus en este mensaje.
> Comprobado por AVG - www.avg.com
> Versión: 10.0.1410 / Base de datos de virus: 1520/3920 - Fecha de
> publicación: 09/26/11
>
>
>

Re: how to implemente a query like " like '%pattern%' "

Posted by Chris Hostetter <ho...@fucit.org>.

: References:
:     <CA...@mail.gmail.com>
: In-Reply-To:
:     <CA...@mail.gmail.com>
: Subject: how to implemente a query like " like '%pattern%' "

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.



-Hoss

how to implemente a query like " like '%pattern%' "

Posted by "Rode González (libnova)" <ro...@libnova.es>.

Hi all.

how can we do a query similar to 'like' ?


if I have this phrase like a single token in the index: "This phrase has various words" (using KeywordTokenizerFactory)
and i like a exact match of:  "phrase has various" or "various words" form instance... 
 
How can i do this??

Thanks a lot.

Rode.


-----
No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 10.0.1410 / Base de datos de virus: 1520/3920 - Fecha de publicación: 09/26/11

Re: drastic performance decrease with 20 cores

Posted by Otis Gospodnetic <ot...@yahoo.com>.

The following should help with size estimation:

http://search-lucene.com/?q=estimate+memory&fc_project=Solr

http://issues.apache.org/jira/browse/LUCENE-3435

I'll just add that with that much RAM you'll be more than fine.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: François Schiettecatte <fs...@gmail.com>
>To: solr-user@lucene.apache.org
>Sent: Monday, September 26, 2011 12:43 PM
>Subject: Re: drastic performance decrease with 20 cores
>
>You have not said how big your index is but I suspect that allocating 13GB for your 20 cores is starving the OS of memory for caching file data. Have you tried 6GB with 20 cores? I suspect you will see the same performance as 6GB & 10 cores.
>
>Generally it is better to allocate just enough memory to SOLR to run optimally rather than as much as possible. 'Just enough' depends as well. You will need to try out different allocations and see where the sweet spot is.
>
>Cheers
>
>François
>
>
>On Sep 26, 2011, at 9:53 AM, Bictor Man wrote:
>
>> Hi everyone,
>> 
>> Sorry if this issue has been discussed before, but I'm new to the list.
>> 
>> I have a solr (3.4) instance running with 20 cores (around 4 million docs
>> each).
>> The instance has allocated 13GB in a 16GB RAM server. If I run several sets
>> of queries sequentially in each of the cores, the I/O access goes very high,
>> so does the system load, while the CPU percentage remains always low.
>> It takes almost 1 hour to complete the set of queries.
>> 
>> If I stop solr and restart it with 6GB allocated and 10 cores, after a bit
>> the I/O access goes down and the CPU goes up, taking only around 5 minutes
>> to complete all sets of queries.
>> 
>> Meaning that for me is MUCH more performant having 2 solr instances running
>> with half the data and half the memory than a single instance will all the
>> data and memory.
>> 
>> It would be even way faster to have 1 instance with half the cores/memory,
>> run the queues, shut it down, start a new instance and repeat the process
>> than having a big instance running everything.
>> 
>> Furthermore, if I take the 20cores/13GB instance, unload 10 of the cores,
>> trigger the garbage collector and run the sets of queries again, the
>> behavior still remains slow taking like 30 minutes.
>> 
>> am I missing something here? does solr change its caching policy depending
>> on the number of cores at startup or something similar?
>> 
>> Any hints will be very appreciated.
>> 
>> Thanks,
>> Victor
>
>
>
>

Re: drastic performance decrease with 20 cores

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Tue, 2011-09-27 at 02:43 +0200, Bictor Man wrote:
> thanks for your replies. indeed the filesystem caching seems to be the
> difference. sadly I can't add more memory and the 6GB/20core combination
> doesn't work. so I'll just try to tweak it as much as I can.

A (better) alternative to more memory is a SSD. Since they are 100 times
faster than spinning drives for random I/O, the cache-miss penalty is
much lower. Consequently free mem for OS cache does not matter as much
and warm up requirements are lowered.

Re: drastic performance decrease with 20 cores

Posted by Bictor Man <bi...@gmail.com>.

Hi guys,

thanks for your replies. indeed the filesystem caching seems to be the
difference. sadly I can't add more memory and the 6GB/20core combination
doesn't work. so I'll just try to tweak it as much as I can.

thanks a lot.


2011/9/26 François Schiettecatte <fs...@gmail.com>

> You have not said how big your index is but I suspect that allocating 13GB
> for your 20 cores is starving the OS of memory for caching file data. Have
> you tried 6GB with 20 cores? I suspect you will see the same performance as
> 6GB & 10 cores.
>
> Generally it is better to allocate just enough memory to SOLR to run
> optimally rather than as much as possible. 'Just enough' depends as well.
> You will need to try out different allocations and see where the sweet spot
> is.
>
> Cheers
>
> François
>
>
> On Sep 26, 2011, at 9:53 AM, Bictor Man wrote:
>
> > Hi everyone,
> >
> > Sorry if this issue has been discussed before, but I'm new to the list.
> >
> > I have a solr (3.4) instance running with 20 cores (around 4 million docs
> > each).
> > The instance has allocated 13GB in a 16GB RAM server. If I run several
> sets
> > of queries sequentially in each of the cores, the I/O access goes very
> high,
> > so does the system load, while the CPU percentage remains always low.
> > It takes almost 1 hour to complete the set of queries.
> >
> > If I stop solr and restart it with 6GB allocated and 10 cores, after a
> bit
> > the I/O access goes down and the CPU goes up, taking only around 5
> minutes
> > to complete all sets of queries.
> >
> > Meaning that for me is MUCH more performant having 2 solr instances
> running
> > with half the data and half the memory than a single instance will all
> the
> > data and memory.
> >
> > It would be even way faster to have 1 instance with half the
> cores/memory,
> > run the queues, shut it down, start a new instance and repeat the process
> > than having a big instance running everything.
> >
> > Furthermore, if I take the 20cores/13GB instance, unload 10 of the cores,
> > trigger the garbage collector and run the sets of queries again, the
> > behavior still remains slow taking like 30 minutes.
> >
> > am I missing something here? does solr change its caching policy
> depending
> > on the number of cores at startup or something similar?
> >
> > Any hints will be very appreciated.
> >
> > Thanks,
> > Victor
>
>

Re: drastic performance decrease with 20 cores

Posted by François Schiettecatte <fs...@gmail.com>.

You have not said how big your index is but I suspect that allocating 13GB for your 20 cores is starving the OS of memory for caching file data. Have you tried 6GB with 20 cores? I suspect you will see the same performance as 6GB & 10 cores.

Generally it is better to allocate just enough memory to SOLR to run optimally rather than as much as possible. 'Just enough' depends as well. You will need to try out different allocations and see where the sweet spot is.

Cheers

François


On Sep 26, 2011, at 9:53 AM, Bictor Man wrote:

> Hi everyone,
> 
> Sorry if this issue has been discussed before, but I'm new to the list.
> 
> I have a solr (3.4) instance running with 20 cores (around 4 million docs
> each).
> The instance has allocated 13GB in a 16GB RAM server. If I run several sets
> of queries sequentially in each of the cores, the I/O access goes very high,
> so does the system load, while the CPU percentage remains always low.
> It takes almost 1 hour to complete the set of queries.
> 
> If I stop solr and restart it with 6GB allocated and 10 cores, after a bit
> the I/O access goes down and the CPU goes up, taking only around 5 minutes
> to complete all sets of queries.
> 
> Meaning that for me is MUCH more performant having 2 solr instances running
> with half the data and half the memory than a single instance will all the
> data and memory.
> 
> It would be even way faster to have 1 instance with half the cores/memory,
> run the queues, shut it down, start a new instance and repeat the process
> than having a big instance running everything.
> 
> Furthermore, if I take the 20cores/13GB instance, unload 10 of the cores,
> trigger the garbage collector and run the sets of queries again, the
> behavior still remains slow taking like 30 minutes.
> 
> am I missing something here? does solr change its caching policy depending
> on the number of cores at startup or something similar?
> 
> Any hints will be very appreciated.
> 
> Thanks,
> Victor