You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by fujian <fu...@nokia.com> on 2010/06/09 17:35:58 UTC

sort field should not be tokenized?


Hello,

I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
it says "The field must be indexed, but should not be tokenized".

But I tried to sort on a tokenized field, it works too. Just wondering
what's the difference between tokenized and untokenized in terms of sort?
Why in javadoc and "Lucene in Action" they all mention that the sort field
should not be tokenzied?

Thanks,
-Fujian


-- 
View this message in context: http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p882569.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: sort field should not be tokenized?

Posted by fujian <fu...@nokia.com>.

Thanks Eric for the detailed explanation. Now I understand what Ian means.

-Fujian
-- 
View this message in context: http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p884107.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: sort field should not be tokenized?

Posted by Erick Erickson <er...@gmail.com>.
Consider analyzing on whitespace, without
removing stopwords for the input "the fox is in
his den". You'd have the terms:
the
fox
is
in
his
den

What does it mean to sort on this field? Which term
should be used?

What if you remove stopwords? What about casing?
Or any of a myriad of other possible things you'd to
with an analyzer.

So the behavior *can* work if you sort on a tokenized
field, but it'll be "interesting". If you happen to have
a field that only tokenizes to single terms, you'll
probably get expected results, but it'll be pretty
fragile..

HTH
Erick

On Wed, Jun 9, 2010 at 11:35 AM, fujian <fu...@nokia.com> wrote:

>
>
> Hello,
>
> I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
> it says "The field must be indexed, but should not be tokenized".
>
> But I tried to sort on a tokenized field, it works too. Just wondering
> what's the difference between tokenized and untokenized in terms of sort?
> Why in javadoc and "Lucene in Action" they all mention that the sort field
> should not be tokenzied?
>
> Thanks,
> -Fujian
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p882569.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: sort field should not be tokenized?

Posted by Ian Lea <ia...@gmail.com>.
Sorting on tokenized fields can work, but may not necessarily do what
you expect, depending on your requirements and how the field is
tokenized.

--
Ian.

On Wed, Jun 9, 2010 at 4:35 PM, fujian <fu...@nokia.com> wrote:
>
>
> Hello,
>
> I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
> it says "The field must be indexed, but should not be tokenized".
>
> But I tried to sort on a tokenized field, it works too. Just wondering
> what's the difference between tokenized and untokenized in terms of sort?
> Why in javadoc and "Lucene in Action" they all mention that the sort field
> should not be tokenzied?
>
> Thanks,
> -Fujian
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p882569.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org