You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Joe Tang <jo...@workmetro.com> on 2007/04/10 22:16:17 UTC

How to sort on a tokenised field?

My task is to index lots of documents with different fields. Some of the
fields are tokenized and are going to be sorted later on when a list of
result set is need to particular field. Unfortunately, Lucene complains
about sort on a tokenized field.

So is there any way to get around of it?

Thanks in advance!!
-- 
View this message in context: http://www.nabble.com/How-to-sort-on-a-tokenised-field--tf3555450.html#a9927597
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to sort on a tokenised field?

Posted by Chris Hostetter <ho...@fucit.org>.
: The worse solution is to have another duplicated field which is un-tokenized
: but it is not scalable when we have lots of fields need to be searchable.

That is really the only solution that exists in in Lucene at the moment.

Typically the number of fields people want to sort on isn't that big
(even if "lots" is 10 it's still feasible to duplicate those 10 fields)
and this appraoch works very well -- people tend to run into more problems
relating to the size of the FieldCache neeeded for each sort field because
they have too many documents long before they have to worry about having
too many sort fields.

There is a fairly low impact patch for people who want to sort on
the "stored" string value of a field if you'd like to try it, but you
should search the archives for discussion about it's tradeoffs.

https://issues.apache.org/jira/browse/LUCENE-769




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to sort on a tokenised field?

Posted by Joe Tang <jo...@workmetro.com>.
I understand what you are trying to say about the problem of sorting a
tokenized field. 

The reason why i try to sort a tokenized field is that I need to have a 
field to be both sortable and searchable in different time. Searchable field
requires tokenized field while sortable field requires un-tokenized field. 

The worse solution is to have another duplicated field which is un-tokenized
but it is not scalable when we have lots of fields need to be searchable.

And that's why I was asking for a good solution other than a quick fix.



Erick Erickson wrote:
> 
> Lucene sorting is intended to sort documents relative to each other.
> So it makes no sense to allow sorts on tokenized fields in the
> Lucene context. Imagine the separate tokens in a field for doc1 of
> a, c and e, and for doc2 b, d and f. Where should doc1 go in
> relation to doc2 when sorting on that field? There's no good
> *general* answer that I've been able to see.
> 
> So I suspect you really want to do something that's not
> document sorting, and if you'd make a clearer statement of
> what you're trying to accomplish I'm sure you'd get better
> answers than mine.
> 
> Best
> Erick
> 
> On 4/10/07, Joe Tang <jo...@workmetro.com> wrote:
>>
>>
>> My task is to index lots of documents with different fields. Some of the
>> fields are tokenized and are going to be sorted later on when a list of
>> result set is need to particular field. Unfortunately, Lucene complains
>> about sort on a tokenized field.
>>
>> So is there any way to get around of it?
>>
>> Thanks in advance!!
>> --
>> View this message in context:
>> http://www.nabble.com/How-to-sort-on-a-tokenised-field--tf3555450.html#a9927597
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/How-to-sort-on-a-tokenised-field--tf3555450.html#a9928177
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to sort on a tokenised field?

Posted by Erick Erickson <er...@gmail.com>.
Lucene sorting is intended to sort documents relative to each other.
So it makes no sense to allow sorts on tokenized fields in the
Lucene context. Imagine the separate tokens in a field for doc1 of
a, c and e, and for doc2 b, d and f. Where should doc1 go in
relation to doc2 when sorting on that field? There's no good
*general* answer that I've been able to see.

So I suspect you really want to do something that's not
document sorting, and if you'd make a clearer statement of
what you're trying to accomplish I'm sure you'd get better
answers than mine.

Best
Erick

On 4/10/07, Joe Tang <jo...@workmetro.com> wrote:
>
>
> My task is to index lots of documents with different fields. Some of the
> fields are tokenized and are going to be sorted later on when a list of
> result set is need to particular field. Unfortunately, Lucene complains
> about sort on a tokenized field.
>
> So is there any way to get around of it?
>
> Thanks in advance!!
> --
> View this message in context:
> http://www.nabble.com/How-to-sort-on-a-tokenised-field--tf3555450.html#a9927597
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>