You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Christoph Kaser <lu...@iconparc.de> on 2012/11/06 15:04:19 UTC

Using DocValues with CollationKeyAnalyzer

Hi all,

for best performance, I use a SortedBytesDocValuesField to sort results. 
I would like to use a ICUCollationKeyAnalyzer for this field, so sorting 
occurs in a "natural" order.
However, it seems as if the SortedBytesDocValuesField does not use an 
analyzers, but expects a ByteRef which is stored "as-is".
Is this correct?

So far, this is the best I could come up with:

com.ibm.icu.text.Collator collator = 
com.ibm.icu.text.Collator.getInstance(new ULocale(column.locale));
collator.setStrength(Collator.SECONDARY);
RawCollationKey key = collator.getRawCollationKey("field_value", null);
BytesRef bytes=new BytesRef(key.bytes, 0, key.size);
SortedBytesDocValuesField sortfield = new 
SortedBytesDocValuesField("sort_field", bytes);

So I don't use the analyzer, but instead "simulate" its behaviour.
Is there another way, or is SortedBytesDocValuesField meant to be used 
like that?

Best Regards,
Christoph Kaser

  
	


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Using DocValues with CollationKeyAnalyzer

Posted by Robert Muir <rc...@gmail.com>.
Hi Christoph: in my opinion, (ICU)Collation should actually be
implemented as DocValues just as you propose: e.g. we'd deprecate the
Analyzer and just offer a (ICU)CollationFields that provide an easy
way to do this, so you would just add one of these to your Lucene
Document.

I started a prototype/discussion on this issue:
https://issues.apache.org/jira/browse/LUCENE-4035 (the patch is likely
a little out of date)

But I think your code here is correct!

On Tue, Nov 6, 2012 at 9:04 AM, Christoph Kaser <lu...@iconparc.de> wrote:
> Hi all,
>
> for best performance, I use a SortedBytesDocValuesField to sort results. I
> would like to use a ICUCollationKeyAnalyzer for this field, so sorting
> occurs in a "natural" order.
> However, it seems as if the SortedBytesDocValuesField does not use an
> analyzers, but expects a ByteRef which is stored "as-is".
> Is this correct?
>
> So far, this is the best I could come up with:
>
> com.ibm.icu.text.Collator collator =
> com.ibm.icu.text.Collator.getInstance(new ULocale(column.locale));
> collator.setStrength(Collator.SECONDARY);
> RawCollationKey key = collator.getRawCollationKey("field_value", null);
> BytesRef bytes=new BytesRef(key.bytes, 0, key.size);
> SortedBytesDocValuesField sortfield = new
> SortedBytesDocValuesField("sort_field", bytes);
>
> So I don't use the analyzer, but instead "simulate" its behaviour.
> Is there another way, or is SortedBytesDocValuesField meant to be used like
> that?
>
> Best Regards,
> Christoph Kaser
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org