You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by Martyn Braithwaite <mb...@openroad.ca> on 2010/07/23 21:45:53 UTC

Sort by 2 fields or concatenated data

I am having trouble with sorting results correctly.



I am using Lucene.Net 2.9.2



I have a field that contains LastName|FirstName data and is un-tokenized.
 This field can contain diacritic characters.



I have tried the following.



1.       Using the SortField(name, locale, reversed) class

2.       Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if the
last name is the same)

3.       Adding 2 SortField fields, one for last name and one for first
name.



In the end they all produce the same result which is:



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*

YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR



But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR



This is the correct alphabetical order



Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?



Thanks for any help.

RE: Sort by 2 fields or concatenated data

Posted by Martyn Braithwaite <mb...@openroad.ca>.
Thanks for the replies. After further investigation I have been able to
track this down to an extra sort that was occurring further along the code
path after the results have been retrieved from Lucene.

I have deleted this extract sort and set lucene to first sort by LastName
and FirstName (as 2 sortfields), this combo is producing the correct
results.

cheers

-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-24-10 2:05 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data

* You make case-insensitive comparisons in "StringSplitComparer" but
case-sensitive in overridden "Compare".
That may be the problem.
* Why don't you just compare the strings as a whole instead of splitting
them into two parts and then compare that parts separately? You would get
the same results.

DIGY

-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Saturday, July 24, 2010 2:04 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data

This is not a complete code sample but might give you a clue to what I am
doing.

For the simple scenario, the data is stored in the index as
LastName|FirstName, it is an un-tokenized field

I add one SortField like:

sortByFields.Add(new SortField("KeywordSortLastName ",
CultureInfo.CurrentUICulture, false));

Based on the data in the field this seems to be obeying the correct
alphabetical rules as I took the same data and created a quick c# scenario
where I did a list sort of the strings and they matched.

For the more involved scenario I took the Camparator that is used for the
locale version and created a new one, I just quickly added the code to
split the data and sort of lastname first and then if required firstname.

e.g.

Sort field added like

sortByFields.Add(new SortField("KeywordSortLastName", new
LastNameFirstNameComparatorSource(), sortDescending));

then created

public class LastNameFirstNameComparatorSource : FieldComparatorSource
    {

    public override FieldComparator  NewComparator(string fieldname, int
numHits, int sortPos, bool reversed)
    {
        return new LastNameFirstNameComparator(numHits, fieldname,
CultureInfo.CurrentUICulture);
    }
}

Then using the class StringComparatorLocale:FieldComparator as the
template, I copied it and changed both the following methods

public override int Compare(int slot1, int slot2)
            {
                String val1 = values[slot1];
                String val2 = values[slot2];
                if (val1 == null)
                {
                    if (val2 == null)
                    {
                        return 0;
                    }
                    return -1;
                }
                if (val2 == null)
                {
                    return 1;
                }

                string[] val1Split = val1.Split(new[] {'|'});
                string[] val2Split = val2.Split(new[] { '|' });

                string val1FirstName = val1Split[1];
                string val1LastName = val1Split[0];

                string val2FirstName = val2Split[1];
                string val2LastName = val2Split[0];

                if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
                {
                    return collator.Compare(val1LastName, val2LastName);
                }

                return collator.Compare(val1FirstName, val2FirstName);
            }

            public override int CompareBottom(int doc)
            {
                System.String val2 = currentReaderValues[doc];
                if (bottom == null)
                {
                    if (val2 == null)
                    {
                        return 0;
                    }
                    return -1;
                }
                else if (val2 == null)
                {
                    return 1;
                }

                string[] val1Split = this.bottom.Split(new[] { '|' });
                string[] val2Split = val2.Split(new[] { '|' });

                string val1FirstName = val1Split[1];
                string val1LastName = val1Split[0];

                string val2FirstName = val2Split[1];
                string val2LastName = val2Split[0];

                if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
                {
                    return collator.Compare(val1LastName, val2LastName);
                }

                return collator.Compare(val1FirstName, val2FirstName);
            }

The rest of this class is identical to the implementation in the core
lucene code.

However, if I create an IComparer like

public class StringSplitComparer : IComparer<String>
    {
        public int Compare(string x, string y)
        {

            if (x == null)
            {
                if (y == null)
                {
                    return 0;
                }
                return -1;
            }
            if (y == null)
            {
                return 1;
            }

            string[] val1Split = x.Split(new[] { '|' });
            string[] val2Split = y.Split(new[] { '|' });

            string val1FirstName = val1Split[1;
            string val1LastName = val1Split[0];

            string val2FirstName = val2Split[1];
            string val2LastName = val2Split[0];

            if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
            {
                return String.Compare(val1LastName, val2LastName, true,
CultureInfo.CurrentUICulture);
            }

            return String.Compare(val1FirstName, val2FirstName, true,
CultureInfo.CurrentUICulture);
        }
    }

Then the IComparer sorts the values correctly where as the Comparator
still shows the results as if it was sorting the concatenated string.

Hopefully this helps.


-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-23-10 3:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data

Is it possible to show the case with a small code fragment?

DIGY

-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data

I am having trouble with sorting results correctly.



I am using Lucene.Net 2.9.2



I have a field that contains LastName|FirstName data and is un-tokenized.
 This field can contain diacritic characters.



I have tried the following.



1.       Using the SortField(name, locale, reversed) class

2.       Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if
the
last name is the same)

3.       Adding 2 SortField fields, one for last name and one for first
name.



In the end they all produce the same result which is:



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*

YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR



But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR



This is the correct alphabetical order



Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?



Thanks for any help.

RE: Sort by 2 fields or concatenated data

Posted by Digy <di...@gmail.com>.
* You make case-insensitive comparisons in "StringSplitComparer" but
case-sensitive in overridden "Compare".
That may be the problem.
* Why don't you just compare the strings as a whole instead of splitting
them into two parts and then compare that parts separately? You would get
the same results.

DIGY

-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca] 
Sent: Saturday, July 24, 2010 2:04 AM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data

This is not a complete code sample but might give you a clue to what I am
doing.

For the simple scenario, the data is stored in the index as
LastName|FirstName, it is an un-tokenized field

I add one SortField like:

sortByFields.Add(new SortField("KeywordSortLastName ",
CultureInfo.CurrentUICulture, false));

Based on the data in the field this seems to be obeying the correct
alphabetical rules as I took the same data and created a quick c# scenario
where I did a list sort of the strings and they matched.

For the more involved scenario I took the Camparator that is used for the
locale version and created a new one, I just quickly added the code to
split the data and sort of lastname first and then if required firstname.

e.g.

Sort field added like

sortByFields.Add(new SortField("KeywordSortLastName", new
LastNameFirstNameComparatorSource(), sortDescending));

then created

public class LastNameFirstNameComparatorSource : FieldComparatorSource
    {

    public override FieldComparator  NewComparator(string fieldname, int
numHits, int sortPos, bool reversed)
    {
        return new LastNameFirstNameComparator(numHits, fieldname,
CultureInfo.CurrentUICulture);
    }
}

Then using the class StringComparatorLocale:FieldComparator as the
template, I copied it and changed both the following methods

public override int Compare(int slot1, int slot2)
            {
                String val1 = values[slot1];
                String val2 = values[slot2];
                if (val1 == null)
                {
                    if (val2 == null)
                    {
                        return 0;
                    }
                    return -1;
                }
                if (val2 == null)
                {
                    return 1;
                }

                string[] val1Split = val1.Split(new[] {'|'});
                string[] val2Split = val2.Split(new[] { '|' });

                string val1FirstName = val1Split[1];
                string val1LastName = val1Split[0];

                string val2FirstName = val2Split[1];
                string val2LastName = val2Split[0];

                if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
                {
                    return collator.Compare(val1LastName, val2LastName);
                }

                return collator.Compare(val1FirstName, val2FirstName);
            }

            public override int CompareBottom(int doc)
            {
                System.String val2 = currentReaderValues[doc];
                if (bottom == null)
                {
                    if (val2 == null)
                    {
                        return 0;
                    }
                    return -1;
                }
                else if (val2 == null)
                {
                    return 1;
                }

                string[] val1Split = this.bottom.Split(new[] { '|' });
                string[] val2Split = val2.Split(new[] { '|' });

                string val1FirstName = val1Split[1];
                string val1LastName = val1Split[0];

                string val2FirstName = val2Split[1];
                string val2LastName = val2Split[0];

                if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
                {
                    return collator.Compare(val1LastName, val2LastName);
                }

                return collator.Compare(val1FirstName, val2FirstName);
            }

The rest of this class is identical to the implementation in the core
lucene code.

However, if I create an IComparer like

public class StringSplitComparer : IComparer<String>
    {
        public int Compare(string x, string y)
        {

            if (x == null)
            {
                if (y == null)
                {
                    return 0;
                }
                return -1;
            }
            if (y == null)
            {
                return 1;
            }

            string[] val1Split = x.Split(new[] { '|' });
            string[] val2Split = y.Split(new[] { '|' });

            string val1FirstName = val1Split[1;
            string val1LastName = val1Split[0];

            string val2FirstName = val2Split[1];
            string val2LastName = val2Split[0];

            if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
            {
                return String.Compare(val1LastName, val2LastName, true,
CultureInfo.CurrentUICulture);
            }

            return String.Compare(val1FirstName, val2FirstName, true,
CultureInfo.CurrentUICulture);
        }
    }

Then the IComparer sorts the values correctly where as the Comparator
still shows the results as if it was sorting the concatenated string.

Hopefully this helps.


-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-23-10 3:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data

Is it possible to show the case with a small code fragment?

DIGY

-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data

I am having trouble with sorting results correctly.



I am using Lucene.Net 2.9.2



I have a field that contains LastName|FirstName data and is un-tokenized.
 This field can contain diacritic characters.



I have tried the following.



1.       Using the SortField(name, locale, reversed) class

2.       Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if
the
last name is the same)

3.       Adding 2 SortField fields, one for last name and one for first
name.



In the end they all produce the same result which is:



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*

YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR



But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR



This is the correct alphabetical order



Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?



Thanks for any help.


RE: Sort by 2 fields or concatenated data

Posted by Martyn Braithwaite <mb...@openroad.ca>.
This is not a complete code sample but might give you a clue to what I am
doing.

For the simple scenario, the data is stored in the index as
LastName|FirstName, it is an un-tokenized field

I add one SortField like:

sortByFields.Add(new SortField("KeywordSortLastName ",
CultureInfo.CurrentUICulture, false));

Based on the data in the field this seems to be obeying the correct
alphabetical rules as I took the same data and created a quick c# scenario
where I did a list sort of the strings and they matched.

For the more involved scenario I took the Camparator that is used for the
locale version and created a new one, I just quickly added the code to
split the data and sort of lastname first and then if required firstname.

e.g.

Sort field added like

sortByFields.Add(new SortField("KeywordSortLastName", new
LastNameFirstNameComparatorSource(), sortDescending));

then created

public class LastNameFirstNameComparatorSource : FieldComparatorSource
    {

    public override FieldComparator  NewComparator(string fieldname, int
numHits, int sortPos, bool reversed)
    {
        return new LastNameFirstNameComparator(numHits, fieldname,
CultureInfo.CurrentUICulture);
    }
}

Then using the class StringComparatorLocale:FieldComparator as the
template, I copied it and changed both the following methods

public override int Compare(int slot1, int slot2)
            {
                String val1 = values[slot1];
                String val2 = values[slot2];
                if (val1 == null)
                {
                    if (val2 == null)
                    {
                        return 0;
                    }
                    return -1;
                }
                if (val2 == null)
                {
                    return 1;
                }

                string[] val1Split = val1.Split(new[] {'|'});
                string[] val2Split = val2.Split(new[] { '|' });

                string val1FirstName = val1Split[1];
                string val1LastName = val1Split[0];

                string val2FirstName = val2Split[1];
                string val2LastName = val2Split[0];

                if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
                {
                    return collator.Compare(val1LastName, val2LastName);
                }

                return collator.Compare(val1FirstName, val2FirstName);
            }

            public override int CompareBottom(int doc)
            {
                System.String val2 = currentReaderValues[doc];
                if (bottom == null)
                {
                    if (val2 == null)
                    {
                        return 0;
                    }
                    return -1;
                }
                else if (val2 == null)
                {
                    return 1;
                }

                string[] val1Split = this.bottom.Split(new[] { '|' });
                string[] val2Split = val2.Split(new[] { '|' });

                string val1FirstName = val1Split[1];
                string val1LastName = val1Split[0];

                string val2FirstName = val2Split[1];
                string val2LastName = val2Split[0];

                if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
                {
                    return collator.Compare(val1LastName, val2LastName);
                }

                return collator.Compare(val1FirstName, val2FirstName);
            }

The rest of this class is identical to the implementation in the core
lucene code.

However, if I create an IComparer like

public class StringSplitComparer : IComparer<String>
    {
        public int Compare(string x, string y)
        {

            if (x == null)
            {
                if (y == null)
                {
                    return 0;
                }
                return -1;
            }
            if (y == null)
            {
                return 1;
            }

            string[] val1Split = x.Split(new[] { '|' });
            string[] val2Split = y.Split(new[] { '|' });

            string val1FirstName = val1Split[1;
            string val1LastName = val1Split[0];

            string val2FirstName = val2Split[1];
            string val2LastName = val2Split[0];

            if (!val1LastName.Equals(val2LastName,
StringComparison.CurrentCulture))
            {
                return String.Compare(val1LastName, val2LastName, true,
CultureInfo.CurrentUICulture);
            }

            return String.Compare(val1FirstName, val2FirstName, true,
CultureInfo.CurrentUICulture);
        }
    }

Then the IComparer sorts the values correctly where as the Comparator
still shows the results as if it was sorting the concatenated string.

Hopefully this helps.


-----Original Message-----
From: Digy [mailto:digydigy@gmail.com]
Sent: July-23-10 3:26 PM
To: lucene-net-dev@lucene.apache.org
Subject: RE: Sort by 2 fields or concatenated data

Is it possible to show the case with a small code fragment?

DIGY

-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca]
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data

I am having trouble with sorting results correctly.



I am using Lucene.Net 2.9.2



I have a field that contains LastName|FirstName data and is un-tokenized.
 This field can contain diacritic characters.



I have tried the following.



1.       Using the SortField(name, locale, reversed) class

2.       Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if
the
last name is the same)

3.       Adding 2 SortField fields, one for last name and one for first
name.



In the end they all produce the same result which is:



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*

YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR



But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR



This is the correct alphabetical order



Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?



Thanks for any help.

RE: Sort by 2 fields or concatenated data

Posted by Digy <di...@gmail.com>.
Is it possible to show the case with a small code fragment?

DIGY

-----Original Message-----
From: Martyn Braithwaite [mailto:mbraithwaite@openroad.ca] 
Sent: Friday, July 23, 2010 10:46 PM
To: lucene-net-dev@lucene.apache.org
Subject: Sort by 2 fields or concatenated data

I am having trouble with sorting results correctly.



I am using Lucene.Net 2.9.2



I have a field that contains LastName|FirstName data and is un-tokenized.
 This field can contain diacritic characters.



I have tried the following.



1.       Using the SortField(name, locale, reversed) class

2.       Create a custom FieldComparatorSource (this splits on the | and
tries to sort by the last name portion first and then the first name if the
last name is the same)

3.       Adding 2 SortField fields, one for last name and one for first
name.



In the end they all produce the same result which is:



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
*YÀTES|RONA*

YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
YÀTES|XAVIOR



But if I run the code from the custom FieldComparator using a List<String>
Sort with a simple IComparer I get



YÀNG|YING
YATES|AISLIN
YATES|ALEXANDREA
YATES|BRUCE
YATES|DOREEN
YATES|GAYELORD
YATES|LAVONE
YATES|LIZZIE
YATES|PRUNELLA
YATES|RONALD
YATES|TANZI
YATES|TATUM
YATES|TRISTAN
YATES|WELDON
*YÀTES|RONA*
YÀTES|XAVIOR



This is the correct alphabetical order



Is it possible to get Lucene to sort the results the same way as the
IComparer? How should I store the data in the index if I want to sort
LastName / FirstName?



Thanks for any help.