You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Raghuveer Kancherla <ra...@aplopio.com> on 2009/12/17 10:52:11 UTC

payload queries running slow

Hi,
With help from the group here, I have been able to set up a search
application with payloads enabled. However, there is a noticeable increase
in query response times with payloads as compared to the same queries
without payloads. I am also seeing a lot more disk IO (I have a 7200 rpm
disk) and comparatively lesser cpu usage.

I am guessing this is because of the use of payloadTermQuery and
payloadNearQuery  both of which extend SpanQuery formats. SpanQueries read
the positions index which will be much larger than the index accessed by a
simple TermQuery.

Is there any way of making this system faster without having to distribute
the index. My index size is hardly 1GB (~200k documents and only one field
to search in). I am experiencing query times as high as 2 seconds (average).

Any indications on the direction in which I can experiment will also be very
helpful.

I looked at HathiTrust digital library articles. The methods indicated there
talk about avoiding reading the positions index (converting PhraseQueries to
TermQueries). That will not work in my case because, I still have to read
the positions index to get the payload information during scoring. Let me
know if my understanding is incorrect.


Thanks,
-Raghu

Re: payload queries running slow

Posted by Grant Ingersoll <gs...@gmail.com>.
On Dec 20, 2009, at 3:41 AM, Raghuveer Kancherla wrote:

> Hi Grant,
> My queries are about 5 times slower when using payloads as compared to
> queries that dont use payloads on the same index. I have not done any
> profiling yet, I am trying out lucid gaze now.

How do they compare to just doing SpanQueries?  Would be interesting to see the three:
1. "Normal" queries
2. Span Queries
3. Payloads


> I do all the load testing after warming up.
> Since my index is small ~1 GB, was wondering if a ramDirectory will help
> instead of the default Directory implementation for the indexReader?
> 

I suppose, but probably not that big of a difference on a properly warmed index.


> Thanks,
> Raghu
> 
> 
> 
> On Thu, Dec 17, 2009 at 6:58 PM, Grant Ingersoll <gs...@apache.org>wrote:
> 
>> 
>> On Dec 17, 2009, at 4:52 AM, Raghuveer Kancherla wrote:
>> 
>>> Hi,
>>> With help from the group here, I have been able to set up a search
>>> application with payloads enabled. However, there is a noticeable
>> increase
>>> in query response times with payloads as compared to the same queries
>>> without payloads. I am also seeing a lot more disk IO (I have a 7200 rpm
>>> disk) and comparatively lesser cpu usage.
>>> 
>>> I am guessing this is because of the use of payloadTermQuery and
>>> payloadNearQuery  both of which extend SpanQuery formats. SpanQueries
>> read
>>> the positions index which will be much larger than the index accessed by
>> a
>>> simple TermQuery.
>>> 
>>> Is there any way of making this system faster without having to
>> distribute
>>> the index. My index size is hardly 1GB (~200k documents and only one
>> field
>>> to search in). I am experiencing query times as high as 2 seconds
>> (average).
>>> 
>>> Any indications on the direction in which I can experiment will also be
>> very
>>> helpful.
>>> 
>> 
>> Yeah, payloads are going to be slower, but how much slower are they for
>> you? Are you warming up those queries?
>> 
>> Also, have you done any profiling?
>> 
>> 
>>> I looked at HathiTrust digital library articles. The methods indicated
>> there
>>> talk about avoiding reading the positions index (converting PhraseQueries
>> to
>>> TermQueries). That will not work in my case because, I still have to read
>>> the positions index to get the payload information during scoring. Let me
>>> know if my understanding is incorrect.
>>> 
>>> 
>>> Thanks,
>>> -Raghu
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
>> 


Re: payload queries running slow

Posted by Raghuveer Kancherla <ra...@aplopio.com>.
Hi Grant,
My queries are about 5 times slower when using payloads as compared to
queries that dont use payloads on the same index. I have not done any
profiling yet, I am trying out lucid gaze now.
I do all the load testing after warming up.
Since my index is small ~1 GB, was wondering if a ramDirectory will help
instead of the default Directory implementation for the indexReader?

Thanks,
Raghu



On Thu, Dec 17, 2009 at 6:58 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Dec 17, 2009, at 4:52 AM, Raghuveer Kancherla wrote:
>
> > Hi,
> > With help from the group here, I have been able to set up a search
> > application with payloads enabled. However, there is a noticeable
> increase
> > in query response times with payloads as compared to the same queries
> > without payloads. I am also seeing a lot more disk IO (I have a 7200 rpm
> > disk) and comparatively lesser cpu usage.
> >
> > I am guessing this is because of the use of payloadTermQuery and
> > payloadNearQuery  both of which extend SpanQuery formats. SpanQueries
> read
> > the positions index which will be much larger than the index accessed by
> a
> > simple TermQuery.
> >
> > Is there any way of making this system faster without having to
> distribute
> > the index. My index size is hardly 1GB (~200k documents and only one
> field
> > to search in). I am experiencing query times as high as 2 seconds
> (average).
> >
> > Any indications on the direction in which I can experiment will also be
> very
> > helpful.
> >
>
> Yeah, payloads are going to be slower, but how much slower are they for
> you? Are you warming up those queries?
>
> Also, have you done any profiling?
>
>
> > I looked at HathiTrust digital library articles. The methods indicated
> there
> > talk about avoiding reading the positions index (converting PhraseQueries
> to
> > TermQueries). That will not work in my case because, I still have to read
> > the positions index to get the payload information during scoring. Let me
> > know if my understanding is incorrect.
> >
> >
> > Thanks,
> > -Raghu
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: payload queries running slow

Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 17, 2009, at 4:52 AM, Raghuveer Kancherla wrote:

> Hi,
> With help from the group here, I have been able to set up a search
> application with payloads enabled. However, there is a noticeable increase
> in query response times with payloads as compared to the same queries
> without payloads. I am also seeing a lot more disk IO (I have a 7200 rpm
> disk) and comparatively lesser cpu usage.
> 
> I am guessing this is because of the use of payloadTermQuery and
> payloadNearQuery  both of which extend SpanQuery formats. SpanQueries read
> the positions index which will be much larger than the index accessed by a
> simple TermQuery.
> 
> Is there any way of making this system faster without having to distribute
> the index. My index size is hardly 1GB (~200k documents and only one field
> to search in). I am experiencing query times as high as 2 seconds (average).
> 
> Any indications on the direction in which I can experiment will also be very
> helpful.
> 

Yeah, payloads are going to be slower, but how much slower are they for you? Are you warming up those queries?  

Also, have you done any profiling?


> I looked at HathiTrust digital library articles. The methods indicated there
> talk about avoiding reading the positions index (converting PhraseQueries to
> TermQueries). That will not work in my case because, I still have to read
> the positions index to get the payload information during scoring. Let me
> know if my understanding is incorrect.
> 
> 
> Thanks,
> -Raghu

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search