You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Zhang, Lisheng" <Li...@BroadVision.com> on 2011/02/27 01:59:48 UTC
Lucene search result produced wrong result (due to java Collation)?
Hi,
Today I have noticed that sometimes lucene sort produced strange result in plain
English names, like (String ASC)
l yy
liu yu
I traced to lucene source code, it seems to be a java English Collator problem (I
set Locale.English to SortField), below I reproduced issue by a trivial code (pure
java):
/////
import java.util.Locale;
import java.text.Collator;
public class T1 {
static public void main(String[] argv) {
String s1 = "l yy";
String s2 = "liu yu";
//s1 = "l";
//s2 = "liu";
Collator col1 = Collator.getInstance(Locale.US);
System.out.println("COLL_RES =" + col1.compare(s1, s2));
System.out.println("STRI_RES =" + s1.compareTo(s2));
}
}
/////
The result is:
COLL_RES =1
STRI_RES =-73
I tested different java versions and get same result, maybe I missed sth trivial, but
above test is really simple?
Thanks very much for helps, Lisheng
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Lucene search result produced wrong result (due to java
Collation)?
Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2011-02-28 at 22:44 +0100, Zhang, Lisheng wrote:
> Very sorry I made a typo, what I meant to say is that lucene sort produced wrong
> result in English names (String ASC):
>
> liu yu
> l yy
The standard Java Collator ignores whitespace. It can be hacked, but you
will have to write your own implementation to get Lucene to sort in the
desired way. FieldComparatorSource is a good place to start.
A code snippet demonstrating the Collator-hack:
public void testJavaStandardCollator() throws Exception {
java.text.Collator javaC =
java.text.Collator.getInstance(new Locale("EN"));
assertTrue("Spaces should be ignored per default",
javaC.compare("liu yu", "l yy") < 0);
java.text.RuleBasedCollator adjustedC = new
java.text.RuleBasedCollator(
((java.text.RuleBasedCollator)javaC).getRules().
replace("<'\u005f'", "<' '<'\u005f'"));
assertTrue("Spaces should be significant inside strings after
adjust",
adjustedC.compare("liu yu", "l yy") > 0);
}
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Lucene search result produced wrong result (due to java Collation)?
Posted by "Zhang, Lisheng" <Li...@BroadVision.com>.
Hi,
Very sorry I made a typo, what I meant to say is that lucene sort produced wrong
result in English names (String ASC):
liu yu
l yy
(previously I put other way round), and the problem is in java Collator which lucene
is using (I can produce bug by sample code below).
Thanks very much for helps, Lisheng
-----Original Message-----
From: Zhang, Lisheng [mailto:Lisheng.Zhang@BroadVision.com]
Sent: Saturday, February 26, 2011 5:00 PM
To: java-user@lucene.apache.org
Subject: Lucene search result produced wrong result (due to java
Collation)?
Hi,
Today I have noticed that sometimes lucene sort produced strange result in plain
English names, like (String ASC)
l yy
liu yu
I traced to lucene source code, it seems to be a java English Collator problem (I
set Locale.English to SortField), below I reproduced issue by a trivial code (pure
java):
/////
import java.util.Locale;
import java.text.Collator;
public class T1 {
static public void main(String[] argv) {
String s1 = "l yy";
String s2 = "liu yu";
//s1 = "l";
//s2 = "liu";
Collator col1 = Collator.getInstance(Locale.US);
System.out.println("COLL_RES =" + col1.compare(s1, s2));
System.out.println("STRI_RES =" + s1.compareTo(s2));
}
}
/////
The result is:
COLL_RES =1
STRI_RES =-73
I tested different java versions and get same result, maybe I missed sth trivial, but
above test is really simple?
Thanks very much for helps, Lisheng
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org