You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alok Bhandari <al...@gmail.com> on 2012/06/22 08:00:02 UTC

How can I optimize Sorting on multiple text fields

Hello,

the requirement which I have is that on solr side we have indexed data of
multiple customers and each customer we have at least a million documents.
After executing search end user want to sort on some fields on datagrid lets
say subject, title, date etc.

Now as the sorting on text fields is costlier what optimisation I can do for
that, I am thinking of following options

1)Create a custom cache and for each customer hold the list of documents in
sorted order of each of the field on which we want to sort . So that when
request for sorting comes from the user I can return a list from cache

2)Use filter query cache , where customer id criteria is added so that each
time I get the docs from filter cache

Please can anybody tell me whether this is the good approach or there is
some better way of doing this?
I am using solr 3.6.

Thanks in advance.


--
View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-optimize-Sorting-on-multiple-text-fields-tp3990874.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can I optimize Sorting on multiple text fields

Posted by Erick Erickson <er...@gmail.com>.
First, you can't sort on text fields, assuming that by "text" you're
talking about
tokenized fields.

Typically, people use copyField to copy these fields into a non-tokenized
string, usually normalized to, for instance, make everything lowercase and
sort on _that_. Note, you're not _displaying_ the sort field, just
sorting on it.

Then, before doing some fancy custom stuff, I'd just try sorting on these
fields. There's no problem specifying multiple sort fields.
As &sort=field1 asc,field2 desc.....

If you include a filter query to restrict the search to customers,
then of course
you'll just get the results viewable for that customer.

Best
Erick

On Fri, Jun 22, 2012 at 2:00 AM, Alok Bhandari
<al...@gmail.com> wrote:
> Hello,
>
> the requirement which I have is that on solr side we have indexed data of
> multiple customers and each customer we have at least a million documents.
> After executing search end user want to sort on some fields on datagrid lets
> say subject, title, date etc.
>
> Now as the sorting on text fields is costlier what optimisation I can do for
> that, I am thinking of following options
>
> 1)Create a custom cache and for each customer hold the list of documents in
> sorted order of each of the field on which we want to sort . So that when
> request for sorting comes from the user I can return a list from cache
>
> 2)Use filter query cache , where customer id criteria is added so that each
> time I get the docs from filter cache
>
> Please can anybody tell me whether this is the good approach or there is
> some better way of doing this?
> I am using solr 3.6.
>
> Thanks in advance.
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-optimize-Sorting-on-multiple-text-fields-tp3990874.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can I optimize Sorting on multiple text fields

Posted by Erick Erickson <er...@gmail.com>.
see: http://solr.pl/en/2011/07/18/deep-paging-problem/

Best
Erick

On Tue, Jun 26, 2012 at 6:37 AM, Alok Bhandari
<al...@gmail.com> wrote:
> Hello Erick,
>
> thanks for the prompt reply you are giving. I have tried the options
> suggested by you but no luck for me this time.
>
> I am facing following issues
>
> 1)author field which is the text field , results of the search are of the
> size of 1 million. Now when I try to sort on author with &start=0 then we
> get the results faster but as and when the &start valued grows to lets say
> 100000 the time taken for returning 250 records is nearly 30sec.
>
> 2)Even of I am giving the same query just by changing the &start parameter
> to higher values performance keeps on degrading.
>
> inputs on this are appreciated.
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-optimize-Sorting-on-multiple-text-fields-tp3990874p3991323.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can I optimize Sorting on multiple text fields

Posted by Alok Bhandari <al...@gmail.com>.
Hello Erick,

thanks for the prompt reply you are giving. I have tried the options
suggested by you but no luck for me this time.

I am facing following issues 

1)author field which is the text field , results of the search are of the
size of 1 million. Now when I try to sort on author with &start=0 then we
get the results faster but as and when the &start valued grows to lets say
100000 the time taken for returning 250 records is nearly 30sec. 

2)Even of I am giving the same query just by changing the &start parameter
to higher values performance keeps on degrading.

inputs on this are appreciated.


--
View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-optimize-Sorting-on-multiple-text-fields-tp3990874p3991323.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can I optimize Sorting on multiple text fields

Posted by Erick Erickson <er...@gmail.com>.
That's already done for you with the queryResultCache in solrconfig.xml.
the "size" parameter is essentially the number of queries for which
results are stored.

Two related parameters in solrconfig.xml , queryResultWindowSize and
queryResultMaxDocsCached are used to control how many entries are stored for
each query. These entries are quite small, just the document ID.

When you ask for documents _outside_ your range (i.e. let's say you have
queryResultMaxDocsCached=50 and specify &start=100) then a _new_ entry
is put in the cache that will contain the IDs of the 100-150th document in the
sorted result list.

If you're talking about storing the sort order for use across users in
some kind of global
sense, the queryResultCache doesn't do that. But before trying this
I'd really pin down
whether sorting is the source of any problems you're having. Once _anyone_ sorts
on a field, the values for that field will be in the lower-level
caches and sorting should
be quite fast. So make sure you test first by comparing the queries
with and without
sorting (and make sure you don't measure the first few sorts that fill
up the caches,
that's what autowarming is all about) before expending the time and
effort going down
this path.

Best
Erick

On Mon, Jun 25, 2012 at 1:08 AM, Alok Bhandari
<al...@gmail.com> wrote:
> Thanks for the inputs.
>
> Eric, Yes I was referring to the String data-type. The reason I was asking
> this is that for a single customer we have multiple users and each user may
> apply different search criteria before sorting on the field so if we can
> cache the sorted results then it may improve the user experience with
> performance.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-optimize-Sorting-on-multiple-text-fields-tp3990874p3991129.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can I optimize Sorting on multiple text fields

Posted by Alok Bhandari <al...@gmail.com>.
Thanks for the inputs.

Eric, Yes I was referring to the String data-type. The reason I was asking
this is that for a single customer we have multiple users and each user may
apply different search criteria before sorting on the field so if we can
cache the sorted results then it may improve the user experience with
performance.

--
View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-optimize-Sorting-on-multiple-text-fields-tp3990874p3991129.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How can I optimize Sorting on multiple text fields

Posted by Amit Jha <sh...@gmail.com>.

On 22-Jun-2012, at 11:30, Alok Bhandari <al...@gmail.com> wrote:

> Hello,
> 
> the requirement which I have is that on solr side we have indexed data of
> multiple customers and each customer we have at least a million documents.
> After executing search end user want to sort on some fields on datagrid lets
> say subject, title, date etc.
> 
You can try any client side libraries to create data table like view and they have sorting for each column. Check YUI Data table. If I understood your case correct than it will be save some query as well.


> Now as the sorting on text fields is costlier what optimisation I can do for
> that, I am thinking of following options
> 
> 1)Create a custom cache and for each customer hold the list of documents in
> sorted order of each of the field on which we want to sort . So that when
> request for sorting comes from the user I can return a list from cache
> 
> 2)Use filter query cache , where customer id criteria is added so that each
> time I get the docs from filter cache
> 
> Please can anybody tell me whether this is the good approach or there is
> some better way of doing this?
> I am using solr 3.6.
> 
> Thanks in advance.
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-optimize-Sorting-on-multiple-text-fields-tp3990874.html
> Sent from the Solr - User mailing list archive at Nabble.com.