You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by George Aroush <ge...@aroush.net> on 2006/12/14 03:20:26 UTC
Locale string compare: Java vs. C#
Hi folks,
Over at Lucene.Net, I have run into a NUnit test which is failing with
Lucene.Net (C#) but is passing with Lucene (Java). The two tests that fail
are: TestInternationalMultiSearcherSort and TestInternationalSort
After several hours of investigation, I narrowed the problem to what I
believe is a difference in the way Java and .NET implement compare.
The code in question is this method (found in FieldSortedHitQueue.java):
public final int compare (final ScoreDoc i, final ScoreDoc j) {
return collator.compare (index[i.doc], index[j.doc]);
}
To demonstrate the compare problem (Java vs. .NET) I crated this simple code
both in Java and C#:
// Java code: you get back 1 for 'res'
String s1 = "H\u00D8T";
String s2 = "HUT";
Collator collator = Collator.getInstance (Locale.US);
int diff = collator.compare(s1, s2);
// C# code: you get back -1 for 'res'
string s1 = "H\u00D8T";
string s2 = "HUT";
System.Globalization.CultureInfo locale = new
System.Globalization.CultureInfo("en-US");
System.Globalization.CompareInfo collator = locale.CompareInfo;
int res = collator.Compare(s1, s2);
Java will give me back a 1 while .NET gives me back -1.
So, what I am trying to figure out is who is doing the right thing? Or am I
missing additional calls before I can compare?
My goal is to understand why the difference exist and thus based on that
understanding I can judge how serious this issue is and find a fix for it or
just document it as a language difference between Java and .NET.
Btw, this is based on Lucene 2.0 for both Java and C# Lucene.
Regards,
-- George Aroush
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
Re: Locale string compare: Java vs. C#
Posted by Chuck Williams <ch...@manawiz.com>.
Surprising but it looks to me like a bug in Java's collation rules for
en-US. According to
http://developer.mimer.com/collations/charts/UCA_latin.htm, \u00D8
(which is Latin Capital Letter O With Stroke) should be before U,
implying -1 is the correct result. Java is returning 1 for all
strengths of the collator. Maybe there is some other subtlety with this
character...
Chuck
George Aroush wrote on 12/13/2006 04:20 PM:
> Hi folks,
>
> Over at Lucene.Net, I have run into a NUnit test which is failing with
> Lucene.Net (C#) but is passing with Lucene (Java). The two tests that fail
> are: TestInternationalMultiSearcherSort and TestInternationalSort
>
> After several hours of investigation, I narrowed the problem to what I
> believe is a difference in the way Java and .NET implement compare.
>
> The code in question is this method (found in FieldSortedHitQueue.java):
>
> public final int compare (final ScoreDoc i, final ScoreDoc j) {
> return collator.compare (index[i.doc], index[j.doc]);
> }
>
> To demonstrate the compare problem (Java vs. .NET) I crated this simple code
> both in Java and C#:
>
> // Java code: you get back 1 for 'res'
> String s1 = "H\u00D8T";
> String s2 = "HUT";
> Collator collator = Collator.getInstance (Locale.US);
> int diff = collator.compare(s1, s2);
>
> // C# code: you get back -1 for 'res'
> string s1 = "H\u00D8T";
> string s2 = "HUT";
> System.Globalization.CultureInfo locale = new
> System.Globalization.CultureInfo("en-US");
> System.Globalization.CompareInfo collator = locale.CompareInfo;
> int res = collator.Compare(s1, s2);
>
> Java will give me back a 1 while .NET gives me back -1.
>
> So, what I am trying to figure out is who is doing the right thing? Or am I
> missing additional calls before I can compare?
>
> My goal is to understand why the difference exist and thus based on that
> understanding I can judge how serious this issue is and find a fix for it or
> just document it as a language difference between Java and .NET.
>
> Btw, this is based on Lucene 2.0 for both Java and C# Lucene.
>
> Regards,
>
> -- George Aroush
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org