You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Matthieu Labour <ma...@strateer.com> on 2009/12/20 21:38:42 UTC

solr perf

Hi
I have a slr instance in which i created 700 core. 1 Core per user of my
application.
The total size of the data indexed on disk is 35GB with solr cores going
from 100KB and few documents to 1.2GB and 50 000 documents.
Searching seems very slow and indexing as well
This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0 disk)
I would appreciate if anybody has some tips, articles etc... as what to do
to understand and improve performance
Thank you

Re: solr perf

Posted by Licinio Fernández Maurelo <li...@gmail.com>.
not bad advise ;-)

2009/12/20 Walter Underwood <wu...@wunderwood.org>

> Here is an idea. Don't make one core per user.  Use a field with a user id.
>
> wunder
>
> On Dec 20, 2009, at 12:38 PM, Matthieu Labour wrote:
>
> > Hi
> > I have a slr instance in which i created 700 core. 1 Core per user of my
> > application.
> > The total size of the data indexed on disk is 35GB with solr cores going
> > from 100KB and few documents to 1.2GB and 50 000 documents.
> > Searching seems very slow and indexing as well
> > This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0
> disk)
> > I would appreciate if anybody has some tips, articles etc... as what to
> do
> > to understand and improve performance
> > Thank you
>
>


-- 
Lici
~Java Developer~

Re: solr perf

Posted by Walter Underwood <wu...@wunderwood.org>.
Here is an idea. Don't make one core per user.  Use a field with a user id.

wunder

On Dec 20, 2009, at 12:38 PM, Matthieu Labour wrote:

> Hi
> I have a slr instance in which i created 700 core. 1 Core per user of my
> application.
> The total size of the data indexed on disk is 35GB with solr cores going
> from 100KB and few documents to 1.2GB and 50 000 documents.
> Searching seems very slow and indexing as well
> This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0 disk)
> I would appreciate if anybody has some tips, articles etc... as what to do
> to understand and improve performance
> Thank you


Re: SynonymFilterFactory parseRules

Posted by Kevin Jackson <fo...@gmail.com>.
Hi

> I am still looking at "synonyms", and the possibility of having synonyms loaded via another mechanism than reading from a text file.

I am also looking at creating a DBSynonymFilterFactory which will
allow us to load the synonyms from a db.

I haven't done much apart from getting solr-trunk and creating the
class - I was going to work on it a little over the holidays as my
holidays project.

Would you like to collaborate?  Not sure how we'd manage it, but there
are enough ways of sharing code now that we could work something out?

>
> A few questions:
> Does anyone know why, in SynonymFilterFactory, the method "parseRules" is package private (actually I'm not sure of the terminology - what I mean is, I can't call this method from outside the package, so I can't use this method if I extend SynonymFilterFactory).
>
> And why are "parseRules" and several other methods static?
>
> Also, where does the "ResourceLoader" supplied to the "inform" method come from? Or, who is it that instantiates the resource-loader and the filter-factory, and calls the "inform" method. Can I influence this at all - for instance, can I inject my own "ResourceLoader" into this call?
>
> Thanks very much for your help,.
> Peter
>

Thanks,
Kev

SynonymFilterFactory parseRules

Posted by "Peter A. Kirk" <pk...@alpha-solutions.dk>.
Hi

I am still looking at "synonyms", and the possibility of having synonyms loaded via another mechanism than reading from a text file.

A few questions:
Does anyone know why, in SynonymFilterFactory, the method "parseRules" is package private (actually I'm not sure of the terminology - what I mean is, I can't call this method from outside the package, so I can't use this method if I extend SynonymFilterFactory).

And why are "parseRules" and several other methods static?

Also, where does the "ResourceLoader" supplied to the "inform" method come from? Or, who is it that instantiates the resource-loader and the filter-factory, and calls the "inform" method. Can I influence this at all - for instance, can I inject my own "ResourceLoader" into this call?

Thanks very much for your help,.
Peter

Re: query log

Posted by Erik Hatcher <er...@gmail.com>.
On Dec 20, 2009, at 7:24 PM, Peter A. Kirk wrote:
> Where is the "HTTP 304" feature enabled and disabled? Or how can I  
> at least ensure that my logger always gets the request, however Solr  
> responds?

It's configured in solrconfig.xml - I generally recommend turning it  
off during development, though I've gotten used to hitting shift- 
refresh in my browser to force a request with no cache headers sent.

	Erik


RE: query log

Posted by "Peter A. Kirk" <pk...@alpha-solutions.dk>.
Hi, thanks for the reply.

Yes, at the moment the queries are coming from a browser (internet explorer) - actually I'm testing with the little webapp that comes with the Solr download.

Where is the "HTTP 304" feature enabled and disabled? Or how can I at least ensure that my logger always gets the request, however Solr responds?


Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:pk@alpha-solutions.dk


-----Original Message-----
From: Erik Hatcher [mailto:erik.hatcher@gmail.com] 
Sent: 21. december 2009 12:26
To: solr-user@lucene.apache.org
Subject: Re: query log

Where are the queries coming from?  A browser?  I bet you've got the  
HTTP 304 feature enabled and your client is sending etag/last-modified  
headers, causing Solr to respond with a 304 response and short circuit.

	Erik

On Dec 20, 2009, at 5:46 PM, Peter A. Kirk wrote:

> Hi, I'd like to write a "Component" that can write to a simple log  
> with query data for every submitted query.
>
> So far I have written a simple Component and configured it to be  
> called in the "standard" requestHandler. However, I have noticed  
> that it is not always called. It's as if some queries are cached -  
> and the standard request handler is not called at all. How do I  
> write EVERY query request to my log?
> 	
>
>
> <searchComponent name="OccurrenceLogger"
> class="my.handler.component.TestFileLogger" />
>
> <requestHandler name="standard" class="solr.SearchHandler"  
> default="true">
>
>    <arr name="first-components">
>      <str>OccurrenceLogger</str>
>    </arr>
>
>
>
>
> Med venlig hilsen / Best regards
>
> Peter Kirk
> E-mail: mailto:pk@alpha-solutions.dk
>


Re: query log

Posted by Erik Hatcher <er...@gmail.com>.
Where are the queries coming from?  A browser?  I bet you've got the  
HTTP 304 feature enabled and your client is sending etag/last-modified  
headers, causing Solr to respond with a 304 response and short circuit.

	Erik

On Dec 20, 2009, at 5:46 PM, Peter A. Kirk wrote:

> Hi, I'd like to write a "Component" that can write to a simple log  
> with query data for every submitted query.
>
> So far I have written a simple Component and configured it to be  
> called in the "standard" requestHandler. However, I have noticed  
> that it is not always called. It's as if some queries are cached -  
> and the standard request handler is not called at all. How do I  
> write EVERY query request to my log?
> 	
>
>
> <searchComponent name="OccurrenceLogger"
> class="my.handler.component.TestFileLogger" />
>
> <requestHandler name="standard" class="solr.SearchHandler"  
> default="true">
>
>    <arr name="first-components">
>      <str>OccurrenceLogger</str>
>    </arr>
>
>
>
>
> Med venlig hilsen / Best regards
>
> Peter Kirk
> E-mail: mailto:pk@alpha-solutions.dk
>


Re: query log

Posted by Chris Hostetter <ho...@fucit.org>.
: Subject: query log
: References: <83...@mail.gmail.com>
: In-Reply-To: <83...@mail.gmail.com>

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss


query log

Posted by "Peter A. Kirk" <pk...@alpha-solutions.dk>.
Hi, I'd like to write a "Component" that can write to a simple log with query data for every submitted query.

So far I have written a simple Component and configured it to be called in the "standard" requestHandler. However, I have noticed that it is not always called. It's as if some queries are cached - and the standard request handler is not called at all. How do I write EVERY query request to my log?
	


<searchComponent name="OccurrenceLogger"
class="my.handler.component.TestFileLogger" />

<requestHandler name="standard" class="solr.SearchHandler" default="true">

    <arr name="first-components">
      <str>OccurrenceLogger</str>
    </arr>




Med venlig hilsen / Best regards

Peter Kirk
E-mail: mailto:pk@alpha-solutions.dk


Re: solr perf

Posted by didier deshommes <df...@gmail.com>.
Have you tried loading solr instances as you need them and unloading
those that are not being used? I wish I could help more, I don't know
many people running that many use cores.

didier

On Sun, Dec 20, 2009 at 2:38 PM, Matthieu Labour <ma...@strateer.com> wrote:
> Hi
> I have a slr instance in which i created 700 core. 1 Core per user of my
> application.
> The total size of the data indexed on disk is 35GB with solr cores going
> from 100KB and few documents to 1.2GB and 50 000 documents.
> Searching seems very slow and indexing as well
> This is running on a EC2 xtra large instance (6CPU, 15GB Memory, Raid0 disk)
> I would appreciate if anybody has some tips, articles etc... as what to do
> to understand and improve performance
> Thank you
>