You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Raghuveer Kancherla <ra...@aplopio.com> on 2009/12/01 10:47:56 UTC

Re: Retrieving large num of docs

Hi Hoss/Andrew,
I think I solved the problem of retrieving 300 docs per request for now. The
problem was that I was storing 2 moderately large multivalued text fields
though I was not retrieving them during search time.  I reindexed all my
data without storing these fields. Now the response time (time for Solr to
return the http response) is very close to the QTime Solr is showing in the
logs.

Thanks for all the help,
Raghu


On Mon, Nov 30, 2009 at 11:37 AM, Raghuveer Kancherla <
raghuveer.kancherla@aplopio.com> wrote:

> Thanks Hoss,
> In my previous mail, I was measuring the system time difference between
> sending a (http) request and receiving a response. This was being run on a
> (different) client machine
>
> Like you suggested, I tried to time the response on the server itself as
> follows:
>
> $ /usr/bin/time -p curl -sS -o solr.out "
> http://localhost:1212/solr/select/?rows=300&q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&start=0&wt=python
> "
> real 3.49
>
> user 0.00
> sys 0.00
>
> The query time in solr log shows me Qtime=600
> size of solr.out is 843 kB.
>
> As you've mentioned, Solr shouldn't give these kind of numbers for 300
> docs, and we're quite perplexed as to whats going on.
>
> Thanks,
> Raghu
>
>
>
>
> On Mon, Nov 30, 2009 at 6:00 AM, Chris Hostetter <hossman_lucene@fucit.org
> > wrote:
>
>>
>> : I am using Solr1.4 for searching through half a million documents. The
>> : problem is, I want to retrieve nearly 200 documents for each search
>> query.
>> : The query time in Solr logs is showing 0.02 seconds and I am fairly
>> happy
>> : with that. However Solr is taking a long time (4 to 5 secs) to return
>> the
>> : results (I think it is because of the number of docs I am requesting). I
>> : tried returning only the id's (unique key) without any other stored
>> fields,
>> : but it is not helping me improve the response times (time to return the
>> id's
>> : of matching documents).
>>
>> What exactly does your request URL look like, and how exactly are you
>> timing the total response time?
>>
>> 200 isn't a very big number for the rows param -- people who want to get
>> 100K documents back in their response at a time may have problems, but 200
>> is not that big.
>>
>> so like i said: how exactly are you timing things?
>>
>> My guess: it's more likely that network overhead or the performance of
>> your client code (reading the data off the wire) is causing your timing
>> code to seem slow, then it is that Solr is taking 5 seconds to write out
>> those document IDs.
>>
>> I suspect if you try hitting the same exact URL using curl via localhost,
>> you'll see the total response time be a lot less then 5 seconds.
>>
>> Here's an example of a query that asks solr to return *every* field from
>> 500 documents, in the XML format.  And these are not small documents...
>>
>> $ /usr/bin/time -p curl -sS -o /tmp/solr.out "
>> http://localhost:5051/solr/select/?q=doctype:product&version=2.2&start=0&rows=500&indent=on
>> "
>> real 0.07
>> user 0.00
>> sys 0.00
>> [chrish@c18-ssa-so-dfll-qry1 ~]$ du -sh /tmp/solr.out
>> 1.6M    /tmp/solr.out
>>
>> ...that's 1.6 MB of 500 Solr documents with all of their fields in
>> verbose XML format (including indenting) fetched in 70ms.
>>
>> If it's taking 5 seconds for you to get just the ids of 200 docs, you've
>> got a problem somewhere and i'm 99% certain it's not in Solr.
>>
>> what does a similar "time curl" command for your URL look like when you
>> run it on your solr server?
>>
>>
>> -Hoss
>>
>>
>

Re: Retrieving large num of docs

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Strange.  Ever figured out the source of performance difference?

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Raghuveer Kancherla <ra...@aplopio.com>
> To: solr-user@lucene.apache.org
> Sent: Sat, December 5, 2009 12:05:49 PM
> Subject: Re: Retrieving large num of docs
> 
> Hi Otis,
> I think my experiments are not conclusive about reduction in search time. I
> was playing around with various configurations to reduce the time to
> retrieve documents from Solr. I am sure that making the two multi valued
> text fields from stored to un-stored, retrieval time (query time + time to
> load the stored fields) became very fast. I was expecting the
> lazyfieldloading setting in solrconfig to take care of this but apparently
> it is not working as expected.
> 
> Out of curiosity, I removed these 2 fields from the index (this time I am
> not even indexing them) and my search time got better (10 times better).
> However, I am still trying to isolate the reason for the search time
> reduction. It may be either because of 2 less fields to search in or because
> of the reduction in size of the index or may be something else. I am not
> sure if lazyfieldloading has any part in explaining this.
> 
> - Raghu
> 
> 
> 
> On Fri, Dec 4, 2009 at 3:07 AM, Otis Gospodnetic 
> > wrote:
> 
> > Hm, hm, interesting.  I was looking into something like this the other day
> > (BIG indexed+stored text fields).  After seeing enableLazyFieldLoading=true
> > in solrconfig and after seeing "fl" didn't include those big fields, I
> > though "hm, so Lucene/Solr will not be pulling those large fields from disk,
> > OK".
> >
> > You are saying that this may not be true based on your experiment?
> > And what I'm calling your "experiment" means that you reindexed the same
> > data, but without the 2 multi-valued text fields... .and that was the only
> > change you made and got cca x10 search performance improvement?
> >
> > Sorry for repeating your words, just trying to confirm and understand.
> >
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
> >
> >
> >
> > ----- Original Message ----
> > > From: Raghuveer Kancherla 
> > > To: solr-user@lucene.apache.org
> > > Sent: Thu, December 3, 2009 8:43:16 AM
> > > Subject: Re: Retrieving large num of docs
> > >
> > > Hi Hoss,
> > >
> > > I was experimenting with various queries to solve this problem and in one
> > > such test I remember that requesting only the ID did not change the
> > > retrieval time. To be sure, I tested it again using the curl command
> > today
> > > and it confirms my previous observation.
> > >
> > > Also, enableLazyFieldLoading setting is set to true in my solrconfig.
> > >
> > > Another general observation (off topic) is that having a moderately large
> > > multi valued text field (~200 entries) in the index seems to slow down
> > the
> > > search significantly. I removed the 2 multi valued text fields from my
> > index
> > > and my search got ~10 time faster. :)
> > >
> > > - Raghu
> > >
> > >
> > > On Thu, Dec 3, 2009 at 2:14 AM, Chris Hostetter wrote:
> > >
> > > >
> > > > : I think I solved the problem of retrieving 300 docs per request for
> > now.
> > > > The
> > > > : problem was that I was storing 2 moderately large multivalued text
> > fields
> > > > : though I was not retrieving them during search time.  I reindexed all
> > my
> > > > : data without storing these fields. Now the response time (time for
> > Solr
> > > > to
> > > > : return the http response) is very close to the QTime Solr is showing
> > in
> > > > the
> > > >
> > > > Hmmm....
> > > >
> > > > two comments:
> > > >
> > > > 1) the example URL from your previous mail...
> > > >
> > > > : >
> > > >
> > >
> > 
> http://localhost:1212/solr/select/?rows=300&q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&start=0&wt=python
> > > >
> > > > ...doesn't match your earlier statemnet that you are only returning hte
> > id
> > > > field (there is no "fl" param in that URL) ... are you certain you
> > werent'
> > > > returning those large stored fields in teh response?
> > > >
> > > > 2) assuming you were actually using an fl param to limit the fields,
> > make
> > > > sure you have this setting in your solrconfig.xml...
> > > >
> > > >    true
> > > >
> > > > ..that should make it pretty fast to return only a few fields of each
> > > > document, even if you do have some jumpto stored fields that aren't
> > being
> > > > returned.
> > > >
> > > >
> > > >
> > > > -Hoss
> > > >
> > > >
> >
> >

Re: Retrieving large num of docs

Posted by Raghuveer Kancherla <ra...@aplopio.com>.

Hi Otis,
I think my experiments are not conclusive about reduction in search time. I
was playing around with various configurations to reduce the time to
retrieve documents from Solr. I am sure that making the two multi valued
text fields from stored to un-stored, retrieval time (query time + time to
load the stored fields) became very fast. I was expecting the
lazyfieldloading setting in solrconfig to take care of this but apparently
it is not working as expected.

Out of curiosity, I removed these 2 fields from the index (this time I am
not even indexing them) and my search time got better (10 times better).
However, I am still trying to isolate the reason for the search time
reduction. It may be either because of 2 less fields to search in or because
of the reduction in size of the index or may be something else. I am not
sure if lazyfieldloading has any part in explaining this.

- Raghu



On Fri, Dec 4, 2009 at 3:07 AM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
> wrote:

> Hm, hm, interesting.  I was looking into something like this the other day
> (BIG indexed+stored text fields).  After seeing enableLazyFieldLoading=true
> in solrconfig and after seeing "fl" didn't include those big fields, I
> though "hm, so Lucene/Solr will not be pulling those large fields from disk,
> OK".
>
> You are saying that this may not be true based on your experiment?
> And what I'm calling your "experiment" means that you reindexed the same
> data, but without the 2 multi-valued text fields... .and that was the only
> change you made and got cca x10 search performance improvement?
>
> Sorry for repeating your words, just trying to confirm and understand.
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
> > From: Raghuveer Kancherla <ra...@aplopio.com>
> > To: solr-user@lucene.apache.org
> > Sent: Thu, December 3, 2009 8:43:16 AM
> > Subject: Re: Retrieving large num of docs
> >
> > Hi Hoss,
> >
> > I was experimenting with various queries to solve this problem and in one
> > such test I remember that requesting only the ID did not change the
> > retrieval time. To be sure, I tested it again using the curl command
> today
> > and it confirms my previous observation.
> >
> > Also, enableLazyFieldLoading setting is set to true in my solrconfig.
> >
> > Another general observation (off topic) is that having a moderately large
> > multi valued text field (~200 entries) in the index seems to slow down
> the
> > search significantly. I removed the 2 multi valued text fields from my
> index
> > and my search got ~10 time faster. :)
> >
> > - Raghu
> >
> >
> > On Thu, Dec 3, 2009 at 2:14 AM, Chris Hostetter wrote:
> >
> > >
> > > : I think I solved the problem of retrieving 300 docs per request for
> now.
> > > The
> > > : problem was that I was storing 2 moderately large multivalued text
> fields
> > > : though I was not retrieving them during search time.  I reindexed all
> my
> > > : data without storing these fields. Now the response time (time for
> Solr
> > > to
> > > : return the http response) is very close to the QTime Solr is showing
> in
> > > the
> > >
> > > Hmmm....
> > >
> > > two comments:
> > >
> > > 1) the example URL from your previous mail...
> > >
> > > : >
> > >
> >
> http://localhost:1212/solr/select/?rows=300&q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&start=0&wt=python
> > >
> > > ...doesn't match your earlier statemnet that you are only returning hte
> id
> > > field (there is no "fl" param in that URL) ... are you certain you
> werent'
> > > returning those large stored fields in teh response?
> > >
> > > 2) assuming you were actually using an fl param to limit the fields,
> make
> > > sure you have this setting in your solrconfig.xml...
> > >
> > >    true
> > >
> > > ..that should make it pretty fast to return only a few fields of each
> > > document, even if you do have some jumpto stored fields that aren't
> being
> > > returned.
> > >
> > >
> > >
> > > -Hoss
> > >
> > >
>
>

Re: Retrieving large num of docs

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hm, hm, interesting.  I was looking into something like this the other day (BIG indexed+stored text fields).  After seeing enableLazyFieldLoading=true in solrconfig and after seeing "fl" didn't include those big fields, I though "hm, so Lucene/Solr will not be pulling those large fields from disk, OK".

You are saying that this may not be true based on your experiment?
And what I'm calling your "experiment" means that you reindexed the same data, but without the 2 multi-valued text fields... .and that was the only change you made and got cca x10 search performance improvement?

Sorry for repeating your words, just trying to confirm and understand.

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



----- Original Message ----
> From: Raghuveer Kancherla <ra...@aplopio.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, December 3, 2009 8:43:16 AM
> Subject: Re: Retrieving large num of docs
> 
> Hi Hoss,
> 
> I was experimenting with various queries to solve this problem and in one
> such test I remember that requesting only the ID did not change the
> retrieval time. To be sure, I tested it again using the curl command today
> and it confirms my previous observation.
> 
> Also, enableLazyFieldLoading setting is set to true in my solrconfig.
> 
> Another general observation (off topic) is that having a moderately large
> multi valued text field (~200 entries) in the index seems to slow down the
> search significantly. I removed the 2 multi valued text fields from my index
> and my search got ~10 time faster. :)
> 
> - Raghu
> 
> 
> On Thu, Dec 3, 2009 at 2:14 AM, Chris Hostetter wrote:
> 
> >
> > : I think I solved the problem of retrieving 300 docs per request for now.
> > The
> > : problem was that I was storing 2 moderately large multivalued text fields
> > : though I was not retrieving them during search time.  I reindexed all my
> > : data without storing these fields. Now the response time (time for Solr
> > to
> > : return the http response) is very close to the QTime Solr is showing in
> > the
> >
> > Hmmm....
> >
> > two comments:
> >
> > 1) the example URL from your previous mail...
> >
> > : >
> > 
> http://localhost:1212/solr/select/?rows=300&q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&start=0&wt=python
> >
> > ...doesn't match your earlier statemnet that you are only returning hte id
> > field (there is no "fl" param in that URL) ... are you certain you werent'
> > returning those large stored fields in teh response?
> >
> > 2) assuming you were actually using an fl param to limit the fields, make
> > sure you have this setting in your solrconfig.xml...
> >
> >    true
> >
> > ..that should make it pretty fast to return only a few fields of each
> > document, even if you do have some jumpto stored fields that aren't being
> > returned.
> >
> >
> >
> > -Hoss
> >
> >

Re: Retrieving large num of docs

Posted by Raghuveer Kancherla <ra...@aplopio.com>.

Hi Hoss,

I was experimenting with various queries to solve this problem and in one
such test I remember that requesting only the ID did not change the
retrieval time. To be sure, I tested it again using the curl command today
and it confirms my previous observation.

Also, enableLazyFieldLoading setting is set to true in my solrconfig.

Another general observation (off topic) is that having a moderately large
multi valued text field (~200 entries) in the index seems to slow down the
search significantly. I removed the 2 multi valued text fields from my index
and my search got ~10 time faster. :)

- Raghu

On Thu, Dec 3, 2009 at 2:14 AM, Chris Hostetter <ho...@fucit.org>wrote:

>
> : I think I solved the problem of retrieving 300 docs per request for now.
> The
> : problem was that I was storing 2 moderately large multivalued text fields
> : though I was not retrieving them during search time.  I reindexed all my
> : data without storing these fields. Now the response time (time for Solr
> to
> : return the http response) is very close to the QTime Solr is showing in
> the
>
> Hmmm....
>
> two comments:
>
> 1) the example URL from your previous mail...
>
> : >
> http://localhost:1212/solr/select/?rows=300&q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&start=0&wt=python
>
> ...doesn't match your earlier statemnet that you are only returning hte id
> field (there is no "fl" param in that URL) ... are you certain you werent'
> returning those large stored fields in teh response?
>
> 2) assuming you were actually using an fl param to limit the fields, make
> sure you have this setting in your solrconfig.xml...
>
>    <enableLazyFieldLoading>true</enableLazyFieldLoading>
>
> ..that should make it pretty fast to return only a few fields of each
> document, even if you do have some jumpto stored fields that aren't being
> returned.
>
>
>
> -Hoss
>
>

Re: Retrieving large num of docs

Posted by Chris Hostetter <ho...@fucit.org>.

: I think I solved the problem of retrieving 300 docs per request for now. The
: problem was that I was storing 2 moderately large multivalued text fields
: though I was not retrieving them during search time.  I reindexed all my
: data without storing these fields. Now the response time (time for Solr to
: return the http response) is very close to the QTime Solr is showing in the

Hmmm....

two comments:

1) the example URL from your previous mail...

: > http://localhost:1212/solr/select/?rows=300&q=%28ResumeAllText%3A%28%28%28%22java+j2ee%22+%28java+j2ee%29%29%29%5E4%29%5E1.0%29&start=0&wt=python

...doesn't match your earlier statemnet that you are only returning hte id 
field (there is no "fl" param in that URL) ... are you certain you werent' 
returning those large stored fields in teh response?

2) assuming you were actually using an fl param to limit the fields, make 
sure you have this setting in your solrconfig.xml...

    <enableLazyFieldLoading>true</enableLazyFieldLoading>

..that should make it pretty fast to return only a few fields of each 
document, even if you do have some jumpto stored fields that aren't being 
returned.



-Hoss