You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jürgen Tiedemann <ju...@yahoo.de> on 2011/06/30 11:58:13 UTC

Adding german phonetic to solr

Hi all,

does solar support german phonetic? Searching for "how to add german phonetic to 
solr" on google does not deliver good results, just lots of JIRA stuff. I 
searched for "cologne phonetic" too. The wikis 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
 and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
my question. Please, can someone tell me how to do it or where to look for 
appropriate information.

Nice regards

Jürgen

Re: AW: Adding german phonetic to solr

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Jürgen,

clearly the Cologne-phonetic was not yet supported, please read:

http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/analysis/PhoneticFilterFactory.java

one would need  to add the line about Cologne-phonetic and recompile.

It'd make sense to open a jira issue for this.

paul

Le 30 juin 2011 à 14:24, Jürgen Tiedemann a écrit :

> Hi Paul,
> 
> thanks for the quick reply. I replaced commons-codec-1.4.jar with 
> commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added
> 
> <filter class="solr.PhoneticFilterFactory" encoder="ColognePhonetic" 
> inject="true"/>
> 
> but then I get
> 
> org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic 
> [[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]].
> 
> How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach 
> completely wrong?
> 
> Jürgen
> 
> 
> 
> 
> 
> 
> ________________________________
> Von: Paul Libbrecht <pa...@hoplahup.net>
> An: solr-user@lucene.apache.org
> Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr
> Betreff: Re: Adding german phonetic to solr
> 
> Jürgen,
> 
> I haven't had the time to deploy it but i heard about "Kölner Phonetik" that was 
> to be contributed as part of apache-commons-codec.
> It probably still is just a patch in a jira issue.
>    https://issues.apache.org/jira/browse/CODEC-106
> The contribution was posted to commons-dev on september 15th 2010.
> 
> Bringing this reachable into Solr would be interesting but it's a bit of work.
> 
> We have used the Double-Metaphone indexer with Lucene with reasonable success in 
> ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is 
> really a desirable feature of a phonetic environment.
> You might want to also care for all the "proper nouns" around for which 
> tradition phonetics is doomed to fail if, at least, your texts are a bit with 
> international names!
> 
> paul
> 
> 
> Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :
> 
>> Hi all,
>> 
>> does solar support german phonetic? Searching for "how to add german phonetic 
>> to 
>> 
>> solr" on google does not deliver good results, just lots of JIRA stuff. I 
>> searched for "cologne phonetic" too. The wikis 
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
>> y
>> and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
>> my question. Please, can someone tell me how to do it or where to look for 
>> appropriate information.
>> 
>> Nice regards
>> 
>> Jürgen


AW: Adding german phonetic to solr

Posted by Jürgen Tiedemann <ju...@yahoo.de>.
Hi Paul,

thanks for the quick reply. I replaced commons-codec-1.4.jar with 
commons-codec-1.5.jar to get the ColognePhonetic. In schema.xml I added

<filter class="solr.PhoneticFilterFactory" encoder="ColognePhonetic" 
inject="true"/>

but then I get

org.apache.solr.common.SolrException: Unknown encoder: ColognePhonetic 
[[CAVERPHONE, SOUNDEX, METAPHONE, DOUBLEMETAPHONE, REFINEDSOUNDEX]].

How do I get PhoneticFilterFactory to know ColognePhonetic? Or is my approach 
completely wrong?

Jürgen






________________________________
Von: Paul Libbrecht <pa...@hoplahup.net>
An: solr-user@lucene.apache.org
Gesendet: Donnerstag, den 30. Juni 2011, 12:09:18 Uhr
Betreff: Re: Adding german phonetic to solr

Jürgen,

I haven't had the time to deploy it but i heard about "Kölner Phonetik" that was 
to be contributed as part of apache-commons-codec.
It probably still is just a patch in a jira issue.
    https://issues.apache.org/jira/browse/CODEC-106
The contribution was posted to commons-dev on september 15th 2010.

Bringing this reachable into Solr would be interesting but it's a bit of work.

We have used the Double-Metaphone indexer with Lucene with reasonable success in 
ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is 
really a desirable feature of a phonetic environment.
You might want to also care for all the "proper nouns" around for which 
tradition phonetics is doomed to fail if, at least, your texts are a bit with 
international names!

paul


Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :

> Hi all,
> 
> does solar support german phonetic? Searching for "how to add german phonetic 
>to 
>
> solr" on google does not deliver good results, just lots of JIRA stuff. I 
> searched for "cologne phonetic" too. The wikis 
>http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
>y
> and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
> my question. Please, can someone tell me how to do it or where to look for 
> appropriate information.
> 
> Nice regards
> 
> Jürgen

Re: Adding german phonetic to solr

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Jürgen,

I haven't had the time to deploy it but i heard about "Kölner Phonetik" that was to be contributed as part of apache-commons-codec.
It probably still is just a patch in a jira issue.
	https://issues.apache.org/jira/browse/CODEC-106
The contribution was posted to commons-dev on september 15th 2010.

Bringing this reachable into Solr would be interesting but it's a bit of work.

We have used the Double-Metaphone indexer with Lucene with reasonable success in ActiveMath but it was not as fine as the Kölner analyzer and fine-graininess is really a desirable feature of a phonetic environment.
You might want to also care for all the "proper nouns" around for which tradition phonetics is doomed to fail if, at least, your texts are a bit with international names!

paul


Le 30 juin 2011 à 11:58, Jürgen Tiedemann a écrit :

> Hi all,
> 
> does solar support german phonetic? Searching for "how to add german phonetic to 
> solr" on google does not deliver good results, just lots of JIRA stuff. I 
> searched for "cologne phonetic" too. The wikis 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters?highlight=%28phonetic%29#solr.PhoneticFilterFactory
> and http://wiki.apache.org/solr/LanguageAnalysis#German haven't also answered 
> my question. Please, can someone tell me how to do it or where to look for 
> appropriate information.
> 
> Nice regards
> 
> Jürgen