You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vasu Y <vy...@gmail.com> on 2016/07/25 16:59:34 UTC

Sorting - uppercase value first or lowercase value

Hi,
 We are indexing our objects into Solr and let users to sort by different
fields. The sort field is defined as specified below in schema.xml:

    <fieldType name="lowercase" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>

For a field of type "lowercase", if we have the field values: APPLES,
ZUCCHINI, banana, BANANA, apples, zucchini and sort in ascending order,
solr produces the result in the following sorted order:
APPLES, apples, BANANA, banana, ZUCCHINI, zucchini.

But we have another tool which also displays the same information from a
database in the following sorted order:
apples, APPLES, banana, BANANA, zucchini, ZUCCHINI

But the database is using the SQL query "select column1 from table1 order
by UPPER(column1) asc".

I could either change SQL query to "select column1 from table1 order by
LOWER(column1) asc" or change solr definition to include
solr.UpperCaseFilterFactory instead of solr.LowerCaseFilterFactory so that
both applications behave same in terms of sorting.

But, in general, when we sort a collection of string values, what should be
the correct sort order? Should upper case value ("APPLE") come before
lowercase value ("apple") or the other way (lowercase value before
uppercase value) when sorting in ascending order?

Thanks,
Vasu

Re: Sorting - uppercase value first or lowercase value

Posted by Erick Erickson <er...@gmail.com>.
Well, since the ascii upper-case codes are smaller than lower case,
i.e.
A = 0x41
a = 0x61

upper case before lower case is correct IMO.

But you're being fooled by the I "tiebreaker" I'd guess,
along with (I suppose) a small number of test docs. When
two docs have the same sort value, the internal Lucene
doc ID is used to break the tie. I suggest that it just happens
that you've indexed your docs with all the upper-case
versions first in your test set and all the lower-case
versions second. If I'm right, and you reverse
the sort order, the docs will still appear upper-case first.

Try interleaving upper and lower case values and I think you'll
see them mixed in the result, i.e.
doc1: APPLE
doc2: apple
doc3: APPLE
doc4: apple

Best,
Erick

On Mon, Jul 25, 2016 at 9:59 AM, Vasu Y <vy...@gmail.com> wrote:
> Hi,
>  We are indexing our objects into Solr and let users to sort by different
> fields. The sort field is defined as specified below in schema.xml:
>
>     <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory" />
>       </analyzer>
>     </fieldType>
>
> For a field of type "lowercase", if we have the field values: APPLES,
> ZUCCHINI, banana, BANANA, apples, zucchini and sort in ascending order,
> solr produces the result in the following sorted order:
> APPLES, apples, BANANA, banana, ZUCCHINI, zucchini.
>
> But we have another tool which also displays the same information from a
> database in the following sorted order:
> apples, APPLES, banana, BANANA, zucchini, ZUCCHINI
>
> But the database is using the SQL query "select column1 from table1 order
> by UPPER(column1) asc".
>
> I could either change SQL query to "select column1 from table1 order by
> LOWER(column1) asc" or change solr definition to include
> solr.UpperCaseFilterFactory instead of solr.LowerCaseFilterFactory so that
> both applications behave same in terms of sorting.
>
> But, in general, when we sort a collection of string values, what should be
> the correct sort order? Should upper case value ("APPLE") come before
> lowercase value ("apple") or the other way (lowercase value before
> uppercase value) when sorting in ascending order?
>
> Thanks,
> Vasu