You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Uwe Schindler <uw...@thetaphi.de> on 2009/03/02 18:34:47 UTC

Sorting and multi-term fields again

I updated yesterday https://issues.apache.org/jira/browse/LUCENE-1372 (added
a relates), but my comment was not posted to this list (maybe relation
updates in JIRA issues will not post to java-dev?).

When working with Shalin Shekhar Mangar on SOLR-940 (where sorting of
TrieRange fields is needed), I again thought about the issue. Maybe we could
change FieldCache to only put the very first term from a field of the
document into the cache, enabling sorting against this field. If possible,
this would be very nice and in my opinion better that the idea proposed in
the issue.

For TrieRange the order proposed in this issue would also be fine, as the
highest precision field terms are always before the lower precision ones in
the global TermEnum, so if FieldCache would sort against the *first* term
and not hit any exceptions in searcher when more terms than documents are
available, I would also be happy.

Any comments or ideas?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Sorting and multi-term fields again

Posted by Earwin Burrfoot <ea...@gmail.com>.
My opinion is that if you want to enable sorting on multi-term fields,
you need a pluggable selection policy. I see someone wanting
biggest/smallest term represent a document when sorting. Or maybe a
function of the terms.

On Mon, Mar 2, 2009 at 20:34, Uwe Schindler <uw...@thetaphi.de> wrote:
> I updated yesterday https://issues.apache.org/jira/browse/LUCENE-1372 (added
> a relates), but my comment was not posted to this list (maybe relation
> updates in JIRA issues will not post to java-dev?).
>
> When working with Shalin Shekhar Mangar on SOLR-940 (where sorting of
> TrieRange fields is needed), I again thought about the issue. Maybe we could
> change FieldCache to only put the very first term from a field of the
> document into the cache, enabling sorting against this field. If possible,
> this would be very nice and in my opinion better that the idea proposed in
> the issue.
>
> For TrieRange the order proposed in this issue would also be fine, as the
> highest precision field terms are always before the lower precision ones in
> the global TermEnum, so if FieldCache would sort against the *first* term
> and not hit any exceptions in searcher when more terms than documents are
> available, I would also be happy.
>
> Any comments or ideas?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>



-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Sorting and multi-term fields again

Posted by Chris Hostetter <ho...@fucit.org>.
: TrieRange fields is needed), I again thought about the issue. Maybe we could
: change FieldCache to only put the very first term from a field of the
: document into the cache, enabling sorting against this field. If possible,
: this would be very nice and in my opinion better that the idea proposed in
: the issue.

in the fairly common case of tokenized fields, the "first" term found 
during enumeration isn't neccessarily (or even frequently) the "first" 
term in the pre-tokenized string ... so this doesn't help people very 
much.

the recommended solution in the tokenized case is to have a "duplicate" 
non tokenized field -- that seems like the best solution in the 
non-tokenized case as well (where the caller is conciously choosing to add 
multiple Field instances with the same fieldName to a a Document)...  pick 
which Field Value represents the value you want used during sorting, and 
add that value to the documetning using an alternate fieldName.

I've never encountered any serious objecting to this approach.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org