You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vasu Y <vy...@gmail.com> on 2016/08/29 16:44:06 UTC

Unicode collation - Sorting text for multiple languages

Hi,
I was looking at Unicode Collation @ Wiki (
http://wiki.apache.org/solr/UnicodeCollation#Sorting_text_for_multiple_languages
) and it seems to suggest that:
Use the Unicode "default" collator (to overcome/minimize increase in disk
and indexing costs) over defining collated fields for each language and
using copyField.

I didn't quite understand how using "default" collator would help
overcome/minimize increase in disk and indexing costs over defining
collated fields for each language.
I thought the only difference between the two is having to define
n-CollationField definitions (for each language) versus one
"CollationField" for the default/ROOT locale in schema.xml. We will anyways
have to use <copyField> to copy from analyzed field to collation field for
each language.

Would appreciate any insights into this.

Thanks,
Vasu