You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christoph Kaser <lu...@iconparc.de> on 2012/11/06 15:04:19 UTC
Using DocValues with CollationKeyAnalyzer
Hi all,
for best performance, I use a SortedBytesDocValuesField to sort results.
I would like to use a ICUCollationKeyAnalyzer for this field, so sorting
occurs in a "natural" order.
However, it seems as if the SortedBytesDocValuesField does not use an
analyzers, but expects a ByteRef which is stored "as-is".
Is this correct?
So far, this is the best I could come up with:
com.ibm.icu.text.Collator collator =
com.ibm.icu.text.Collator.getInstance(new ULocale(column.locale));
collator.setStrength(Collator.SECONDARY);
RawCollationKey key = collator.getRawCollationKey("field_value", null);
BytesRef bytes=new BytesRef(key.bytes, 0, key.size);
SortedBytesDocValuesField sortfield = new
SortedBytesDocValuesField("sort_field", bytes);
So I don't use the analyzer, but instead "simulate" its behaviour.
Is there another way, or is SortedBytesDocValuesField meant to be used
like that?
Best Regards,
Christoph Kaser
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Using DocValues with CollationKeyAnalyzer
Posted by Robert Muir <rc...@gmail.com>.
Hi Christoph: in my opinion, (ICU)Collation should actually be
implemented as DocValues just as you propose: e.g. we'd deprecate the
Analyzer and just offer a (ICU)CollationFields that provide an easy
way to do this, so you would just add one of these to your Lucene
Document.
I started a prototype/discussion on this issue:
https://issues.apache.org/jira/browse/LUCENE-4035 (the patch is likely
a little out of date)
But I think your code here is correct!
On Tue, Nov 6, 2012 at 9:04 AM, Christoph Kaser <lu...@iconparc.de> wrote:
> Hi all,
>
> for best performance, I use a SortedBytesDocValuesField to sort results. I
> would like to use a ICUCollationKeyAnalyzer for this field, so sorting
> occurs in a "natural" order.
> However, it seems as if the SortedBytesDocValuesField does not use an
> analyzers, but expects a ByteRef which is stored "as-is".
> Is this correct?
>
> So far, this is the best I could come up with:
>
> com.ibm.icu.text.Collator collator =
> com.ibm.icu.text.Collator.getInstance(new ULocale(column.locale));
> collator.setStrength(Collator.SECONDARY);
> RawCollationKey key = collator.getRawCollationKey("field_value", null);
> BytesRef bytes=new BytesRef(key.bytes, 0, key.size);
> SortedBytesDocValuesField sortfield = new
> SortedBytesDocValuesField("sort_field", bytes);
>
> So I don't use the analyzer, but instead "simulate" its behaviour.
> Is there another way, or is SortedBytesDocValuesField meant to be used like
> that?
>
> Best Regards,
> Christoph Kaser
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org