You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jesús Martín García <jm...@cesca.cat> on 2011/10/17 12:19:23 UTC

millions of records problem

Hi,

I've got 500 millions of documents in solr everyone with the same number 
of fields an similar width. The version of solr which I used is 1.4.1 
with lucene 2.9.3.

I don't have the option to use shards so the whole index has to be in a 
machine...

The size of the index is about 50Gb and the ram is 8Gb....Everything is 
working but the searches are so slowly, although I tried different 
configurations of the solrconfig.xml as:

- configure a first searcher with the most used searches
- configure the caches (query, filter and document) with great numbers...

but everything is still working slowly, so do you have any ideas to 
boost the searches without the penalty to use much more ram?

Thanks in advance,

Jesús

-- 
.......................................................................
       __
     /   /       Jesús Martín García
C E / S / C A   Tècnic de Projectes
   /__ /         Centre de Serveis Científics i Acadèmics de Catalunya

Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
T. 93 551 6213 · F. 93 205 6979 · jmartin@cesca.cat
.......................................................................


Re: millions of records problem

Posted by Tom Gullo <sp...@gmail.com>.
Getting a solid-state drive might help

--
View this message in context: http://lucene.472066.n3.nabble.com/millions-of-records-problem-tp3427796p3431309.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: millions of records problem

Posted by Vadim Kisselmann <v....@googlemail.com>.
Hi,
a number of relevant questions is given.
i have another one:
which type of docs do you have? Do you add some new docs every day? Or is it
a stable number of docs (500Mio.) ?
What about Replication?

Regards Vadim


2011/10/17 Otis Gospodnetic <ot...@yahoo.com>

> Hi Jesús,
>
> Others have already asked a number of relevant question.  If I had to
> guess, I'd guess this is simply a disk IO issue, but of course there may be
> room for improvement without getting more RAM or SSDs, so tell us more about
> your queries, about disk IO you are seeing, etc.
>
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
> >________________________________
> >From: Jesús Martín García <jm...@cesca.cat>
> >To: solr-user@lucene.apache.org
> >Sent: Monday, October 17, 2011 6:19 AM
> >Subject: millions of records problem
> >
> >Hi,
> >
> >I've got 500 millions of documents in solr everyone with the same number
> of fields an similar width. The version of solr which I used is 1.4.1 with
> lucene 2.9.3.
> >
> >I don't have the option to use shards so the whole index has to be in a
> machine...
> >
> >The size of the index is about 50Gb and the ram is 8Gb....Everything is
> working but the searches are so slowly, although I tried different
> configurations of the solrconfig.xml as:
> >
> >- configure a first searcher with the most used searches
> >- configure the caches (query, filter and document) with great numbers...
> >
> >but everything is still working slowly, so do you have any ideas to boost
> the searches without the penalty to use much more ram?
> >
> >Thanks in advance,
> >
> >Jesús
> >
> >-- .......................................................................
> >      __
> >    /   /       Jesús Martín García
> >C E / S / C A   Tècnic de Projectes
> >  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya
> >
> >Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
> >T. 93 551 6213 · F. 93 205 6979 · jmartin@cesca.cat
> >.......................................................................
> >
> >
> >
> >
>

Re: millions of records problem

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Jesús,

Others have already asked a number of relevant question.  If I had to guess, I'd guess this is simply a disk IO issue, but of course there may be room for improvement without getting more RAM or SSDs, so tell us more about your queries, about disk IO you are seeing, etc.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Jesús Martín García <jm...@cesca.cat>
>To: solr-user@lucene.apache.org
>Sent: Monday, October 17, 2011 6:19 AM
>Subject: millions of records problem
>
>Hi,
>
>I've got 500 millions of documents in solr everyone with the same number of fields an similar width. The version of solr which I used is 1.4.1 with lucene 2.9.3.
>
>I don't have the option to use shards so the whole index has to be in a machine...
>
>The size of the index is about 50Gb and the ram is 8Gb....Everything is working but the searches are so slowly, although I tried different configurations of the solrconfig.xml as:
>
>- configure a first searcher with the most used searches
>- configure the caches (query, filter and document) with great numbers...
>
>but everything is still working slowly, so do you have any ideas to boost the searches without the penalty to use much more ram?
>
>Thanks in advance,
>
>Jesús
>
>-- .......................................................................
>      __
>    /   /       Jesús Martín García
>C E / S / C A   Tècnic de Projectes
>  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya
>
>Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
>T. 93 551 6213 · F. 93 205 6979 · jmartin@cesca.cat
>.......................................................................
>
>
>
>

Re: millions of records problem

Posted by Nick Veenhof <ni...@gmail.com>.
You could use this technique? I'm currently reading up on it
http://khaidoan.wikidot.com/solr-common-gram-filter


On 17 October 2011 12:57, Jan Høydahl <ja...@cominvent.com> wrote:
> Hi,
>
> What exactly do you mean by "slow" search? 1s? 10s?
> Which operating system, how many CPUs, which servlet container and how much RAM have you allocated to your JVM? (-Xmx)
> What kind and size of docs? Your numbers indicate about 100bytes per doc?
> What kind of searches? Facets? Sorting? Wildcards?
> Have you tried to "slim down" you schema by setting indexed="false" and stored="false" wherever possible?
>
> First thought is that it's really impressive if you've managed to get 500mill docs into one index with only 8Gb RAM!! I would expect that to fail or best case be veery slow. If you have a beefy server I'd first try putting in 64Gb RAM, slim down your schema and perhaps even switch to Solr4.0(trunk) which is more RAM efficient.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 17. okt. 2011, at 12:19, Jesús Martín García wrote:
>
>> Hi,
>>
>> I've got 500 millions of documents in solr everyone with the same number of fields an similar width. The version of solr which I used is 1.4.1 with lucene 2.9.3.
>>
>> I don't have the option to use shards so the whole index has to be in a machine...
>>
>> The size of the index is about 50Gb and the ram is 8Gb....Everything is working but the searches are so slowly, although I tried different configurations of the solrconfig.xml as:
>>
>> - configure a first searcher with the most used searches
>> - configure the caches (query, filter and document) with great numbers...
>>
>> but everything is still working slowly, so do you have any ideas to boost the searches without the penalty to use much more ram?
>>
>> Thanks in advance,
>>
>> Jesús
>>
>> --
>> .......................................................................
>>      __
>>    /   /       Jesús Martín García
>> C E / S / C A   Tècnic de Projectes
>>  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya
>>
>> Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
>> T. 93 551 6213 · F. 93 205 6979 · jmartin@cesca.cat
>> .......................................................................
>>
>
>

Re: millions of records problem

Posted by Jan Høydahl <ja...@cominvent.com>.
Hi,

What exactly do you mean by "slow" search? 1s? 10s?
Which operating system, how many CPUs, which servlet container and how much RAM have you allocated to your JVM? (-Xmx)
What kind and size of docs? Your numbers indicate about 100bytes per doc?
What kind of searches? Facets? Sorting? Wildcards?
Have you tried to "slim down" you schema by setting indexed="false" and stored="false" wherever possible?

First thought is that it's really impressive if you've managed to get 500mill docs into one index with only 8Gb RAM!! I would expect that to fail or best case be veery slow. If you have a beefy server I'd first try putting in 64Gb RAM, slim down your schema and perhaps even switch to Solr4.0(trunk) which is more RAM efficient.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 17. okt. 2011, at 12:19, Jesús Martín García wrote:

> Hi,
> 
> I've got 500 millions of documents in solr everyone with the same number of fields an similar width. The version of solr which I used is 1.4.1 with lucene 2.9.3.
> 
> I don't have the option to use shards so the whole index has to be in a machine...
> 
> The size of the index is about 50Gb and the ram is 8Gb....Everything is working but the searches are so slowly, although I tried different configurations of the solrconfig.xml as:
> 
> - configure a first searcher with the most used searches
> - configure the caches (query, filter and document) with great numbers...
> 
> but everything is still working slowly, so do you have any ideas to boost the searches without the penalty to use much more ram?
> 
> Thanks in advance,
> 
> Jesús
> 
> -- 
> .......................................................................
>      __
>    /   /       Jesús Martín García
> C E / S / C A   Tècnic de Projectes
>  /__ /         Centre de Serveis Científics i Acadèmics de Catalunya
> 
> Gran Capità, 2-4 (Edifici Nexus) · 08034 Barcelona
> T. 93 551 6213 · F. 93 205 6979 · jmartin@cesca.cat
> .......................................................................
>