You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Momo..Lelo .." <ga...@hotmail.com> on 2011/10/16 11:04:50 UTC

Implement Custom Soundex

Dear,

Does anyone there has an experience of developing a custom Soundex.  

  If you have an experience doing this and can offer some help and share experience I'd really appreciate it.


 		 	   		  

RE: Implement Custom Soundex

Posted by "Momo..Lelo .." <ga...@hotmail.com>.
thank you for this information. 

> Subject: Re: Implement Custom Soundex
> From: paul@hoplahup.net
> Date: Sun, 23 Oct 2011 10:58:49 +0200
> To: solr-user@lucene.apache.org
> 
> Momo,
> 
> if you have the conversion text to tokens then all you need to do is implement a custom analyzer, deploy it inside the solr webapp, then plug it into the schema.
> 
> Is that the part that is hard?
> I thought the wiki was helpful there but may some other issue is holding you.
> One zoology of such analyzers is at:
> 	http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> 
> If that is the issue, here's a one sentence explanation: if you have a new analyzer you want to declare a new field-type and field with that analyzer; queries should be going through it as well as indexing. Matching word A with word B will then happen if word A and B are converted by your analyzer to the same token (this is how cat and cats match when using the PorterStemmer for example).
> 
> paul
> 
> 
> Le 16 oct. 2011 à 14:09, Momo..Lelo .. a écrit :
> 
> > 
> > Dear Gora, 
> > 
> > Thank you for the quick response. 
> > 
> > Actually I 
> > need to do Soundex for Arabic language. The code is already done in Java. But I 
> > couldn't understand how can I implement it as Solr filter. 
> > 
> > Regards,
> > 
> > 
> > 
> >> From: gora@mimirtech.com
> >> Date: Sun, 16 Oct 2011 16:19:48 +0530
> >> Subject: Re: Implement Custom Soundex
> >> To: solr-user@lucene.apache.org
> >> 
> >> 2011/10/16 Momo..Lelo .. <ga...@hotmail.com>:
> >>> 
> >>> Dear,
> >>> 
> >>> Does anyone there has an experience of developing a custom Soundex.
> >>> 
> >>> If you have an experience doing this and can offer some help and share experience I'd really appreciate it.
> >> 
> >> I presume that this is in the context of Solr, and spell-checking.
> >> We did this as an exercise for Indian-language words transliterated
> >> into English, hooking into the open-source spell-checking library,
> >> aspell, which provided us  with a soundex-like algorithm (the actual
> >> algorithm is quite different, but works better than soundex, at
> >> least for our use case). We were quite satisfied with the results,
> >> though unfortunately this never went into production.
> >> 
> >> Would be glad to help, though I am going to be really busy the
> >> next few days. Please do provide us with more details on your
> >> requirements.
> >> 
> >> Regards,
> >> Gora
> > 		 	   		  
> 
 		 	   		  

Re: Implement Custom Soundex

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Momo,

if you have the conversion text to tokens then all you need to do is implement a custom analyzer, deploy it inside the solr webapp, then plug it into the schema.

Is that the part that is hard?
I thought the wiki was helpful there but may some other issue is holding you.
One zoology of such analyzers is at:
	http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

If that is the issue, here's a one sentence explanation: if you have a new analyzer you want to declare a new field-type and field with that analyzer; queries should be going through it as well as indexing. Matching word A with word B will then happen if word A and B are converted by your analyzer to the same token (this is how cat and cats match when using the PorterStemmer for example).

paul


Le 16 oct. 2011 à 14:09, Momo..Lelo .. a écrit :

> 
> Dear Gora, 
> 
> Thank you for the quick response. 
> 
> Actually I 
> need to do Soundex for Arabic language. The code is already done in Java. But I 
> couldn't understand how can I implement it as Solr filter. 
> 
> Regards,
> 
> 
> 
>> From: gora@mimirtech.com
>> Date: Sun, 16 Oct 2011 16:19:48 +0530
>> Subject: Re: Implement Custom Soundex
>> To: solr-user@lucene.apache.org
>> 
>> 2011/10/16 Momo..Lelo .. <ga...@hotmail.com>:
>>> 
>>> Dear,
>>> 
>>> Does anyone there has an experience of developing a custom Soundex.
>>> 
>>> If you have an experience doing this and can offer some help and share experience I'd really appreciate it.
>> 
>> I presume that this is in the context of Solr, and spell-checking.
>> We did this as an exercise for Indian-language words transliterated
>> into English, hooking into the open-source spell-checking library,
>> aspell, which provided us  with a soundex-like algorithm (the actual
>> algorithm is quite different, but works better than soundex, at
>> least for our use case). We were quite satisfied with the results,
>> though unfortunately this never went into production.
>> 
>> Would be glad to help, though I am going to be really busy the
>> next few days. Please do provide us with more details on your
>> requirements.
>> 
>> Regards,
>> Gora
> 		 	   		  


RE: Implement Custom Soundex

Posted by "Momo..Lelo .." <ga...@hotmail.com>.
Dear Gora, 

Thank you for the quick response. 

Actually I 
need to do Soundex for Arabic language. The code is already done in Java. But I 
couldn't understand how can I implement it as Solr filter. 

Regards,



> From: gora@mimirtech.com
> Date: Sun, 16 Oct 2011 16:19:48 +0530
> Subject: Re: Implement Custom Soundex
> To: solr-user@lucene.apache.org
> 
> 2011/10/16 Momo..Lelo .. <ga...@hotmail.com>:
> >
> > Dear,
> >
> > Does anyone there has an experience of developing a custom Soundex.
> >
> >  If you have an experience doing this and can offer some help and share experience I'd really appreciate it.
> 
> I presume that this is in the context of Solr, and spell-checking.
> We did this as an exercise for Indian-language words transliterated
> into English, hooking into the open-source spell-checking library,
> aspell, which provided us  with a soundex-like algorithm (the actual
> algorithm is quite different, but works better than soundex, at
> least for our use case). We were quite satisfied with the results,
> though unfortunately this never went into production.
> 
> Would be glad to help, though I am going to be really busy the
> next few days. Please do provide us with more details on your
> requirements.
> 
> Regards,
> Gora
 		 	   		  

Re: Implement Custom Soundex

Posted by Gora Mohanty <go...@mimirtech.com>.
2011/10/16 Momo..Lelo .. <ga...@hotmail.com>:
>
> Dear,
>
> Does anyone there has an experience of developing a custom Soundex.
>
>  If you have an experience doing this and can offer some help and share experience I'd really appreciate it.

I presume that this is in the context of Solr, and spell-checking.
We did this as an exercise for Indian-language words transliterated
into English, hooking into the open-source spell-checking library,
aspell, which provided us  with a soundex-like algorithm (the actual
algorithm is quite different, but works better than soundex, at
least for our use case). We were quite satisfied with the results,
though unfortunately this never went into production.

Would be glad to help, though I am going to be really busy the
next few days. Please do provide us with more details on your
requirements.

Regards,
Gora