You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Fischer <fi...@aon.at> on 2014/02/19 12:45:44 UTC

Problems with ICUCollationField

Hello,

I'm migrating to solr 4.6.1 and have problems with the ICUCollationField (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).

I get consistently the error message 
Error loading class 'solr.ICUCollationField'.
even after
INFO: Adding 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to classloader
and
INFO: Adding 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar' to classloader.

Am I missing something?

I solr's subversion I found
/SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
but no corresponding class in solr4.6.1's contrib folder.

Best
Thomas


Re: Problems with ICUCollationField

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Feb 19, 2014 at 10:33 AM, Thomas Fischer <fi...@aon.at> wrote:

>
> > Hmm, for standardization of text fields, collation might be a little
> > awkward.
>
> I arrived there after using custom rules for a while (see
> "RuleBasedCollator" on http://wiki.apache.org/solr/UnicodeCollation) and
> then being told
> "For better performance, less memory usage, and support for more locales,
> you can add the analysis-extras contrib and use
> ICUCollationKeyFilterFactory instead." (on the same page under "ICU
> Collation").
>
> > For your german umlauts, what do you mean by standardize? is this to
> > achieve equivalency of e.g. oe to ö in your search terms?
>
> That is the main point, but I might also need the additional normalization
> of combined characters like
> o+  ̈ = ö and probably similar constructions for other languages (like
> Hungarian).
>

Sure but using collation to get normalization is pretty overkill too. Maybe
try ICUNormalizer2Filter? This gives you better control over the
normalization anyway.


>
> > In that case, a simpler approach would be to put
> > GermanNormalizationFilterFactory in your chain:
> >
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html
>
> I'll see how far I get with this, but from the description
>         • 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
>         • 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
> this seems to be too far-reaching a reduction: while the identification
> "ä=ae" is not very serious and rarely misleading, "ä=a" might pack words
> together that shouldn't be, "Äsen" and "Asen" are quite different concepts,
>

I'm not sure thats a mainstream opinion: not only do the default german
collation rules conflate these two characters as equivalent at primary
level, but so do many german stemming algorithms. Similar arguments could
be made for 'résumé' versus 'resume' and so on. Search isn't an exact
science.

Re: Problems with ICUCollationField

Posted by Thomas Fischer <fi...@aon.at>.
> Hmm, for standardization of text fields, collation might be a little
> awkward.

I arrived there after using custom rules for a while (see "RuleBasedCollator" on http://wiki.apache.org/solr/UnicodeCollation) and then being told
"For better performance, less memory usage, and support for more locales, you can add the analysis-extras contrib and use ICUCollationKeyFilterFactory instead." (on the same page under "ICU Collation").

> For your german umlauts, what do you mean by standardize? is this to
> achieve equivalency of e.g. oe to ö in your search terms?

That is the main point, but I might also need the additional normalization of combined characters like
o+  ̈ = ö and probably similar constructions for other languages (like Hungarian).

> In that case, a simpler approach would be to put
> GermanNormalizationFilterFactory in your chain:
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html

I'll see how far I get with this, but from the description
	• 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
	• 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
this seems to be too far-reaching a reduction: while the identification "ä=ae" is not very serious and rarely misleading, "ä=a" might pack words together that shouldn't be, "Äsen" and "Asen" are quite different concepts,

In general, the deprecation of ICUCollationKeyFilterFactory doesn't seem to be really thought through.

Thanks anyway, best
Thomas

> 
> On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer <fi...@aon.at> wrote:
> 
>> Thanks, that helps!
>> 
>> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
>> I used before to the ICUCollationField.
>> Is there any description how to achieve this?
>> 
>> First tries now yield
>> 
>> ICUCollationField does not support specifying an analyzer.
>> 
>> which makes it complicated since I used the ICUCollationKeyFilterFactory
>> to standardize my text fields (in particular because of German Umlauts).
>> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
>> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
>> Or is this somehow wrapped into the ICUCollationField?
>> 
>> I didn't find ICUCollationField  in the solr wiki and not much information
>> in the reference.
>> And the hint
>> 
>> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
>> see solr/contrib/analysis-extras/README.txt for instructions on which jars
>> you need to add to your SOLR_HOME/lib in order to use it."
>> 
>> is misleading insofar as this README.txt doesn't mention the
>> solr-analysis-extras-4.6.1.jar in dist.
>> 
>> Best
>> Thomas
>> 
>> 
>> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar itself, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer <fi...@aon.at>
>> wrote:
>>> 
>>>> Hello Robert,
>>>> 
>>>> I already added
>>>> contrib/analysis-extras/lib/
>>>> and
>>>> contrib/analysis-extras/lucene-libs/
>>>> via lib directives in solrconfig, this is why the classes mentioned are
>>>> loaded.
>>>> 
>>>> Do you know which jar is supposed to contain the ICUCollationField?
>>>> 
>>>> Best regards
>>>> Thomas
>>>> 
>>>> 
>>>> 
>>>> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>>>> 
>>>>> you need the solr analysis-extras jar in your classpath, too.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer <fi...@aon.at>
>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I'm migrating to solr 4.6.1 and have problems with the
>> ICUCollationField
>>>>>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>>>>>> 
>>>>>> I get consistently the error message
>>>>>> Error loading class 'solr.ICUCollationField'.
>>>>>> even after
>>>>>> INFO: Adding
>>>>>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>>>>>> classloader
>>>>>> and
>>>>>> INFO: Adding
>>>>>> 
>>>> 
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>>>>>> to classloader.
>>>>>> 
>>>>>> Am I missing something?
>>>>>> 
>>>>>> I solr's subversion I found
>>>>>> 
>>>>>> 
>>>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>>>>>> but no corresponding class in solr4.6.1's contrib folder.
>>>>>> 
>>>>>> Best
>>>>>> Thomas
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 


Re: Problems with ICUCollationField

Posted by Robert Muir <rc...@gmail.com>.
Hmm, for standardization of text fields, collation might be a little
awkward.

For your german umlauts, what do you mean by standardize? is this to
achieve equivalency of e.g. oe to ö in your search terms?

In that case, a simpler approach would be to put
GermanNormalizationFilterFactory in your chain:
http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html


On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer <fi...@aon.at> wrote:

> Thanks, that helps!
>
> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
> I used before to the ICUCollationField.
> Is there any description how to achieve this?
>
> First tries now yield
>
> ICUCollationField does not support specifying an analyzer.
>
> which makes it complicated since I used the ICUCollationKeyFilterFactory
> to standardize my text fields (in particular because of German Umlauts).
> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
> Or is this somehow wrapped into the ICUCollationField?
>
> I didn't find ICUCollationField  in the solr wiki and not much information
> in the reference.
> And the hint
>
> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
> see solr/contrib/analysis-extras/README.txt for instructions on which jars
> you need to add to your SOLR_HOME/lib in order to use it."
>
> is misleading insofar as this README.txt doesn't mention the
> solr-analysis-extras-4.6.1.jar in dist.
>
> Best
> Thomas
>
>
> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>
> > you need the solr analysis-extras jar itself, too.
> >
> >
> >
> > On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer <fi...@aon.at>
> wrote:
> >
> >> Hello Robert,
> >>
> >> I already added
> >> contrib/analysis-extras/lib/
> >> and
> >> contrib/analysis-extras/lucene-libs/
> >> via lib directives in solrconfig, this is why the classes mentioned are
> >> loaded.
> >>
> >> Do you know which jar is supposed to contain the ICUCollationField?
> >>
> >> Best regards
> >> Thomas
> >>
> >>
> >>
> >> Am 19.02.2014 um 13:54 schrieb Robert Muir:
> >>
> >>> you need the solr analysis-extras jar in your classpath, too.
> >>>
> >>>
> >>>
> >>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer <fi...@aon.at>
> >> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> I'm migrating to solr 4.6.1 and have problems with the
> ICUCollationField
> >>>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
> >>>>
> >>>> I get consistently the error message
> >>>> Error loading class 'solr.ICUCollationField'.
> >>>> even after
> >>>> INFO: Adding
> >>>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
> >>>> classloader
> >>>> and
> >>>> INFO: Adding
> >>>>
> >>
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
> >>>> to classloader.
> >>>>
> >>>> Am I missing something?
> >>>>
> >>>> I solr's subversion I found
> >>>>
> >>>>
> >>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
> >>>> but no corresponding class in solr4.6.1's contrib folder.
> >>>>
> >>>> Best
> >>>> Thomas
> >>>>
> >>>>
> >>
> >>
>
>

Re: Problems with ICUCollationField

Posted by Thomas Fischer <fi...@aon.at>.
Thanks, that helps!

I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory I used before to the ICUCollationField.
Is there any description how to achieve this?

First tries now yield

ICUCollationField does not support specifying an analyzer.

which makes it complicated since I used the ICUCollationKeyFilterFactory to standardize my text fields (in particular because of German Umlauts).
But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a LetterTokenizer, etc. doesn't do me much good, I'm afraid.
Or is this somehow wrapped into the ICUCollationField?

I didn't find ICUCollationField  in the solr wiki and not much information in the reference.
And the hint

"solr.ICUCollationField is included in the Solr analysis-extras contrib - see solr/contrib/analysis-extras/README.txt for instructions on which jars you need to add to your SOLR_HOME/lib in order to use it."

is misleading insofar as this README.txt doesn't mention the solr-analysis-extras-4.6.1.jar in dist.

Best
Thomas


Am 19.02.2014 um 14:27 schrieb Robert Muir:

> you need the solr analysis-extras jar itself, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer <fi...@aon.at> wrote:
> 
>> Hello Robert,
>> 
>> I already added
>> contrib/analysis-extras/lib/
>> and
>> contrib/analysis-extras/lucene-libs/
>> via lib directives in solrconfig, this is why the classes mentioned are
>> loaded.
>> 
>> Do you know which jar is supposed to contain the ICUCollationField?
>> 
>> Best regards
>> Thomas
>> 
>> 
>> 
>> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar in your classpath, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer <fi...@aon.at>
>> wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
>>>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>>>> 
>>>> I get consistently the error message
>>>> Error loading class 'solr.ICUCollationField'.
>>>> even after
>>>> INFO: Adding
>>>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>>>> classloader
>>>> and
>>>> INFO: Adding
>>>> 
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>>>> to classloader.
>>>> 
>>>> Am I missing something?
>>>> 
>>>> I solr's subversion I found
>>>> 
>>>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>>>> but no corresponding class in solr4.6.1's contrib folder.
>>>> 
>>>> Best
>>>> Thomas
>>>> 
>>>> 
>> 
>> 


Re: Problems with ICUCollationField

Posted by Robert Muir <rc...@gmail.com>.
you need the solr analysis-extras jar itself, too.



On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer <fi...@aon.at> wrote:

> Hello Robert,
>
> I already added
> contrib/analysis-extras/lib/
> and
> contrib/analysis-extras/lucene-libs/
> via lib directives in solrconfig, this is why the classes mentioned are
> loaded.
>
> Do you know which jar is supposed to contain the ICUCollationField?
>
> Best regards
> Thomas
>
>
>
> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>
> > you need the solr analysis-extras jar in your classpath, too.
> >
> >
> >
> > On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer <fi...@aon.at>
> wrote:
> >
> >> Hello,
> >>
> >> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
> >> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
> >>
> >> I get consistently the error message
> >> Error loading class 'solr.ICUCollationField'.
> >> even after
> >> INFO: Adding
> >> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
> >> classloader
> >> and
> >> INFO: Adding
> >>
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
> >> to classloader.
> >>
> >> Am I missing something?
> >>
> >> I solr's subversion I found
> >>
> >>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
> >> but no corresponding class in solr4.6.1's contrib folder.
> >>
> >> Best
> >> Thomas
> >>
> >>
>
>

Re: Problems with ICUCollationField

Posted by Thomas Fischer <fi...@aon.at>.
Hello Robert,

I already added
contrib/analysis-extras/lib/
and
contrib/analysis-extras/lucene-libs/
via lib directives in solrconfig, this is why the classes mentioned are loaded.

Do you know which jar is supposed to contain the ICUCollationField?

Best regards
Thomas



Am 19.02.2014 um 13:54 schrieb Robert Muir:

> you need the solr analysis-extras jar in your classpath, too.
> 
> 
> 
> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer <fi...@aon.at> wrote:
> 
>> Hello,
>> 
>> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>> 
>> I get consistently the error message
>> Error loading class 'solr.ICUCollationField'.
>> even after
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>> classloader
>> and
>> INFO: Adding
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>> to classloader.
>> 
>> Am I missing something?
>> 
>> I solr's subversion I found
>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>> but no corresponding class in solr4.6.1's contrib folder.
>> 
>> Best
>> Thomas
>> 
>> 


Re: Problems with ICUCollationField

Posted by Robert Muir <rc...@gmail.com>.
you need the solr analysis-extras jar in your classpath, too.



On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer <fi...@aon.at> wrote:

> Hello,
>
> I'm migrating to solr 4.6.1 and have problems with the ICUCollationField
> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>
> I get consistently the error message
> Error loading class 'solr.ICUCollationField'.
> even after
> INFO: Adding
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
> classloader
> and
> INFO: Adding
> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
> to classloader.
>
> Am I missing something?
>
> I solr's subversion I found
>
> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
> but no corresponding class in solr4.6.1's contrib folder.
>
> Best
> Thomas
>
>