You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Martyn Braithwaite <mb...@openroad.ca> on 2010/07/23 21:45:53 UTC
Sort by 2 fields or concatenated data
I am having trouble with sorting results correctly.
I am using Lucene.Net 2.9.2
I have a field that contains LastName|FirstName data and is un-tokenized.
This field can contain diacritic characters.
I have tried the following.
1. Using the SortField(name, locale, reversed) class
2. Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if the
last name is the same)
3. Adding 2 SortField fields, one for last name and one for first
name.
In the end they all produce the same result which is:
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR
But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR
This is the correct alphabetical order
Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?
Thanks for any help.
RE: Sort by 2 fields or concatenated data
Posted by Martyn Braithwaite <mb...@openroad.ca>.
Thanks for the replies. After further investigation I have been able to
track this down to an extra sort that was occurring further along the code
path after the results have been retrieved from Lucene.
I have deleted this extract sort and set lucene to first sort by LastName
and FirstName (as 2 sortfields), this combo is producing the correct
results.
cheers
-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-24-10 2:05 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data
* You make case-insensitive comparisons in "StringSplitComparer" but
case-sensitive in overridden "Compare".
That may be the problem.
* Why don't you just compare the strings as a whole instead of splitting
them into two parts and then compare that parts separately? You would get
the same results.
DIGY
-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Saturday, July 24, 2010 2:04 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data
This is not a complete code sample but might give you a clue to what I am
doing.
For the simple scenario, the data is stored in the index as
LastName|FirstName, it is an un-tokenized field
I add one SortField like:
sortByFields.Add(new SortField("KeywordSortLastName ",
CultureInfo.CurrentUICulture, false));
Based on the data in the field this seems to be obeying the correct
alphabetical rules as I took the same data and created a quick c# scenario
where I did a list sort of the strings and they matched.
For the more involved scenario I took the Camparator that is used for the
locale version and created a new one, I just quickly added the code to
split the data and sort of lastname first and then if required firstname.
e.g.
Sort field added like
sortByFields.Add(new SortField("KeywordSortLastName", new
LastNameFirstNameComparatorSource(), sortDescending));
then created
public class LastNameFirstNameComparatorSource : FieldComparatorSource
{
public override FieldComparator NewComparator(string fieldname, int
numHits, int sortPos, bool reversed)
{
return new LastNameFirstNameComparator(numHits, fieldname,
CultureInfo.CurrentUICulture);
}
}
Then using the class StringComparatorLocale:FieldComparator as the
template, I copied it and changed both the following methods
public override int Compare(int slot1, int slot2)
{
String val1 = values[slot1];
String val2 = values[slot2];
if (val1 == null)
{
if (val2 == null)
{
return 0;
}
return -1;
}
if (val2 == null)
{
return 1;
}
string[] val1Split = val1.Split(new[] {'|'});
string[] val2Split = val2.Split(new[] { '|' });
string val1FirstName = val1Split[1];
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return collator.Compare(val1LastName, val2LastName);
}
return collator.Compare(val1FirstName, val2FirstName);
}
public override int CompareBottom(int doc)
{
System.String val2 = currentReaderValues[doc];
if (bottom == null)
{
if (val2 == null)
{
return 0;
}
return -1;
}
else if (val2 == null)
{
return 1;
}
string[] val1Split = this.bottom.Split(new[] { '|' });
string[] val2Split = val2.Split(new[] { '|' });
string val1FirstName = val1Split[1];
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return collator.Compare(val1LastName, val2LastName);
}
return collator.Compare(val1FirstName, val2FirstName);
}
The rest of this class is identical to the implementation in the core
lucene code.
However, if I create an IComparer like
public class StringSplitComparer : IComparer<String>
{
public int Compare(string x, string y)
{
if (x == null)
{
if (y == null)
{
return 0;
}
return -1;
}
if (y == null)
{
return 1;
}
string[] val1Split = x.Split(new[] { '|' });
string[] val2Split = y.Split(new[] { '|' });
string val1FirstName = val1Split[1;
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return String.Compare(val1LastName, val2LastName, true,
CultureInfo.CurrentUICulture);
}
return String.Compare(val1FirstName, val2FirstName, true,
CultureInfo.CurrentUICulture);
}
}
Then the IComparer sorts the values correctly where as the Comparator
still shows the results as if it was sorting the concatenated string.
Hopefully this helps.
-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-23-10 3:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data
Is it possible to show the case with a small code fragment?
DIGY
-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data
I am having trouble with sorting results correctly.
I am using Lucene.Net 2.9.2
I have a field that contains LastName|FirstName data and is un-tokenized.
This field can contain diacritic characters.
I have tried the following.
1. Using the SortField(name, locale, reversed) class
2. Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if
the
last name is the same)
3. Adding 2 SortField fields, one for last name and one for first
name.
In the end they all produce the same result which is:
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR
But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR
This is the correct alphabetical order
Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?
Thanks for any help.
RE: Sort by 2 fields or concatenated data
Posted by Digy <di...@gmail.com>.
* You make case-insensitive comparisons in "StringSplitComparer" but
case-sensitive in overridden "Compare".
That may be the problem.
* Why don't you just compare the strings as a whole instead of splitting
them into two parts and then compare that parts separately? You would get
the same results.
DIGY
-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Saturday, July 24, 2010 2:04 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data
This is not a complete code sample but might give you a clue to what I am
doing.
For the simple scenario, the data is stored in the index as
LastName|FirstName, it is an un-tokenized field
I add one SortField like:
sortByFields.Add(new SortField("KeywordSortLastName ",
CultureInfo.CurrentUICulture, false));
Based on the data in the field this seems to be obeying the correct
alphabetical rules as I took the same data and created a quick c# scenario
where I did a list sort of the strings and they matched.
For the more involved scenario I took the Camparator that is used for the
locale version and created a new one, I just quickly added the code to
split the data and sort of lastname first and then if required firstname.
e.g.
Sort field added like
sortByFields.Add(new SortField("KeywordSortLastName", new
LastNameFirstNameComparatorSource(), sortDescending));
then created
public class LastNameFirstNameComparatorSource : FieldComparatorSource
{
public override FieldComparator NewComparator(string fieldname, int
numHits, int sortPos, bool reversed)
{
return new LastNameFirstNameComparator(numHits, fieldname,
CultureInfo.CurrentUICulture);
}
}
Then using the class StringComparatorLocale:FieldComparator as the
template, I copied it and changed both the following methods
public override int Compare(int slot1, int slot2)
{
String val1 = values[slot1];
String val2 = values[slot2];
if (val1 == null)
{
if (val2 == null)
{
return 0;
}
return -1;
}
if (val2 == null)
{
return 1;
}
string[] val1Split = val1.Split(new[] {'|'});
string[] val2Split = val2.Split(new[] { '|' });
string val1FirstName = val1Split[1];
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return collator.Compare(val1LastName, val2LastName);
}
return collator.Compare(val1FirstName, val2FirstName);
}
public override int CompareBottom(int doc)
{
System.String val2 = currentReaderValues[doc];
if (bottom == null)
{
if (val2 == null)
{
return 0;
}
return -1;
}
else if (val2 == null)
{
return 1;
}
string[] val1Split = this.bottom.Split(new[] { '|' });
string[] val2Split = val2.Split(new[] { '|' });
string val1FirstName = val1Split[1];
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return collator.Compare(val1LastName, val2LastName);
}
return collator.Compare(val1FirstName, val2FirstName);
}
The rest of this class is identical to the implementation in the core
lucene code.
However, if I create an IComparer like
public class StringSplitComparer : IComparer<String>
{
public int Compare(string x, string y)
{
if (x == null)
{
if (y == null)
{
return 0;
}
return -1;
}
if (y == null)
{
return 1;
}
string[] val1Split = x.Split(new[] { '|' });
string[] val2Split = y.Split(new[] { '|' });
string val1FirstName = val1Split[1;
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return String.Compare(val1LastName, val2LastName, true,
CultureInfo.CurrentUICulture);
}
return String.Compare(val1FirstName, val2FirstName, true,
CultureInfo.CurrentUICulture);
}
}
Then the IComparer sorts the values correctly where as the Comparator
still shows the results as if it was sorting the concatenated string.
Hopefully this helps.
-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-23-10 3:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data
Is it possible to show the case with a small code fragment?
DIGY
-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data
I am having trouble with sorting results correctly.
I am using Lucene.Net 2.9.2
I have a field that contains LastName|FirstName data and is un-tokenized.
This field can contain diacritic characters.
I have tried the following.
1. Using the SortField(name, locale, reversed) class
2. Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if
the
last name is the same)
3. Adding 2 SortField fields, one for last name and one for first
name.
In the end they all produce the same result which is:
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR
But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR
This is the correct alphabetical order
Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?
Thanks for any help.
RE: Sort by 2 fields or concatenated data
Posted by Martyn Braithwaite <mb...@openroad.ca>.
This is not a complete code sample but might give you a clue to what I am
doing.
For the simple scenario, the data is stored in the index as
LastName|FirstName, it is an un-tokenized field
I add one SortField like:
sortByFields.Add(new SortField("KeywordSortLastName ",
CultureInfo.CurrentUICulture, false));
Based on the data in the field this seems to be obeying the correct
alphabetical rules as I took the same data and created a quick c# scenario
where I did a list sort of the strings and they matched.
For the more involved scenario I took the Camparator that is used for the
locale version and created a new one, I just quickly added the code to
split the data and sort of lastname first and then if required firstname.
e.g.
Sort field added like
sortByFields.Add(new SortField("KeywordSortLastName", new
LastNameFirstNameComparatorSource(), sortDescending));
then created
public class LastNameFirstNameComparatorSource : FieldComparatorSource
{
public override FieldComparator NewComparator(string fieldname, int
numHits, int sortPos, bool reversed)
{
return new LastNameFirstNameComparator(numHits, fieldname,
CultureInfo.CurrentUICulture);
}
}
Then using the class StringComparatorLocale:FieldComparator as the
template, I copied it and changed both the following methods
public override int Compare(int slot1, int slot2)
{
String val1 = values[slot1];
String val2 = values[slot2];
if (val1 == null)
{
if (val2 == null)
{
return 0;
}
return -1;
}
if (val2 == null)
{
return 1;
}
string[] val1Split = val1.Split(new[] {'|'});
string[] val2Split = val2.Split(new[] { '|' });
string val1FirstName = val1Split[1];
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return collator.Compare(val1LastName, val2LastName);
}
return collator.Compare(val1FirstName, val2FirstName);
}
public override int CompareBottom(int doc)
{
System.String val2 = currentReaderValues[doc];
if (bottom == null)
{
if (val2 == null)
{
return 0;
}
return -1;
}
else if (val2 == null)
{
return 1;
}
string[] val1Split = this.bottom.Split(new[] { '|' });
string[] val2Split = val2.Split(new[] { '|' });
string val1FirstName = val1Split[1];
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return collator.Compare(val1LastName, val2LastName);
}
return collator.Compare(val1FirstName, val2FirstName);
}
The rest of this class is identical to the implementation in the core
lucene code.
However, if I create an IComparer like
public class StringSplitComparer : IComparer<String>
{
public int Compare(string x, string y)
{
if (x == null)
{
if (y == null)
{
return 0;
}
return -1;
}
if (y == null)
{
return 1;
}
string[] val1Split = x.Split(new[] { '|' });
string[] val2Split = y.Split(new[] { '|' });
string val1FirstName = val1Split[1;
string val1LastName = val1Split[0];
string val2FirstName = val2Split[1];
string val2LastName = val2Split[0];
if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
{
return String.Compare(val1LastName, val2LastName, true,
CultureInfo.CurrentUICulture);
}
return String.Compare(val1FirstName, val2FirstName, true,
CultureInfo.CurrentUICulture);
}
}
Then the IComparer sorts the values correctly where as the Comparator
still shows the results as if it was sorting the concatenated string.
Hopefully this helps.
-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-23-10 3:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data
Is it possible to show the case with a small code fragment?
DIGY
-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data
I am having trouble with sorting results correctly.
I am using Lucene.Net 2.9.2
I have a field that contains LastName|FirstName data and is un-tokenized.
This field can contain diacritic characters.
I have tried the following.
1. Using the SortField(name, locale, reversed) class
2. Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if
the
last name is the same)
3. Adding 2 SortField fields, one for last name and one for first
name.
In the end they all produce the same result which is:
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR
But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR
This is the correct alphabetical order
Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?
Thanks for any help.
RE: Sort by 2 fields or concatenated data
Posted by Digy <di...@gmail.com>.
Is it possible to show the case with a small code fragment?
DIGY
-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data
I am having trouble with sorting results correctly.
I am using Lucene.Net 2.9.2
I have a field that contains LastName|FirstName data and is un-tokenized.
This field can contain diacritic characters.
I have tried the following.
1. Using the SortField(name, locale, reversed) class
2. Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if the
last name is the same)
3. Adding 2 SortField fields, one for last name and one for first
name.
In the end they all produce the same result which is:
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR
But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get
YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR
This is the correct alphabetical order
Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?
Thanks for any help.