You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Anderson vasconcelos <an...@gmail.com> on 2012/01/20 17:40:21 UTC

Phonetic search for portuguese

Hi

The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
Caverphone) is only for english language or works for other languages? Have
some phonetic filter for portuguese? If dont have, how i can implement this?

Thanks

Re: Phonetic search for portuguese

Posted by Gora Mohanty <go...@mimirtech.com>.
On Mon, Jan 23, 2012 at 9:21 AM, Anderson vasconcelos
<an...@gmail.com> wrote:
> Thanks a lot Gora.
> I need to delivery the first release for my client on 25 january.
> With your explanation, i can negociate better the date to delivery of
> this feature for next month, because i have other business rules for
> delivery and this features is more complex than i thought.

OK.I have ideas on how to improve this solution, but
we can take these up at a later stage. We have tested
this solution, and I know that it works. I will also be
discussing with people here about how soon we can
open source this.

> I could help you to shared this solution with solr community. Maybe we
> can create some component in google code, or something like that, wich
> any solr user can use.

Yes, I have been meaning to do that forever, but work has
been intruding. We will put up something on BitBucket as
soon as possible.

Regards,
Gora

Re: Phonetic search for portuguese

Posted by Anderson vasconcelos <an...@gmail.com>.
Thanks a lot Gora.
I need to delivery the first release for my client on 25 january.
With your explanation, i can negociate better the date to delivery of
this feature for next month, because i have other business rules for
delivery and this features is more complex than i thought.
I could help you to shared this solution with solr community. Maybe we
can create some component in google code, or something like that, wich
any solr user can use.

2012/1/23, Gora Mohanty <go...@mimirtech.com>:
> On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos
> <an...@gmail.com> wrote:
>> Hi Gora, thanks for the reply.
>>
>> I'm interesting in see how you did this solution. But , my time is not
>> to long and i need to create some solution for my client early. If
>> anyone knows some other simple and fast solution, please post on this
>> thread.
>
> What is your time line? I will see if we can expedite the open
> sourcing of this.
>
>> Gora, you could talk how you implemented the Custom Filter Factory and
>> how used this on SOLR?
> [...]
>
> That part is quite simple, though it is possible that I have not
> correctly addressed all issues for a custom FilterFactory.
> Please see:
>   AspellFilterFactory: http://pastebin.com/jTBcfmd1
>   AspellFilter:            http://pastebin.com/jDDKrPiK
>
> The latter loads a java_aspell library that is created by SWIG
> by setting up Java bindings on top of SWIG, and configuring
> it for the language of interest.
>
> Next, you will need a library that encapsulates various
> aspell functionality in Java. I am afraid that this is a little
> long:
>   Suggest: http://pastebin.com/6NrGCVma
>
> Finally, you will have to set up the Solr schema to use
> this filter factory, e.g., one could create a new Solr
> TextField, where the solr.DoubleMetaphoneFilterFactory
> is replaced with
> com.mimirtech.search.solr.analysis.AspellFilterFactory
>
> We can discuss further how to set this up, but should
> probably take that discussion off-list.
>
> Regards,
> Gora
>

Re: Phonetic search for portuguese

Posted by Gora Mohanty <go...@mimirtech.com>.
On Mon, Jan 23, 2012 at 5:58 AM, Anderson vasconcelos
<an...@gmail.com> wrote:
> Hi Gora, thanks for the reply.
>
> I'm interesting in see how you did this solution. But , my time is not
> to long and i need to create some solution for my client early. If
> anyone knows some other simple and fast solution, please post on this
> thread.

What is your time line? I will see if we can expedite the open
sourcing of this.

> Gora, you could talk how you implemented the Custom Filter Factory and
> how used this on SOLR?
[...]

That part is quite simple, though it is possible that I have not
correctly addressed all issues for a custom FilterFactory.
Please see:
  AspellFilterFactory: http://pastebin.com/jTBcfmd1
  AspellFilter:            http://pastebin.com/jDDKrPiK

The latter loads a java_aspell library that is created by SWIG
by setting up Java bindings on top of SWIG, and configuring
it for the language of interest.

Next, you will need a library that encapsulates various
aspell functionality in Java. I am afraid that this is a little
long:
  Suggest: http://pastebin.com/6NrGCVma

Finally, you will have to set up the Solr schema to use
this filter factory, e.g., one could create a new Solr
TextField, where the solr.DoubleMetaphoneFilterFactory
is replaced with
com.mimirtech.search.solr.analysis.AspellFilterFactory

We can discuss further how to set this up, but should
probably take that discussion off-list.

Regards,
Gora

Re: Phonetic search for portuguese

Posted by Anderson vasconcelos <an...@gmail.com>.
Hi Gora, thanks for the reply.

I'm interesting in see how you did this solution. But , my time is not
to long and i need to create some solution for my client early. If
anyone knows some other simple and fast solution, please post on this
thread.

Gora, you could talk how you implemented the Custom Filter Factory and
how used this on SOLR?

Thanks


2012/1/22, Gora Mohanty <go...@mimirtech.com>:
> On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos
> <an...@gmail.com> wrote:
>> Anyone could help?
>>
>> Thanks
>>
>> 2012/1/20, Anderson vasconcelos <an...@gmail.com>:
>>> Hi
>>>
>>> The phonetic filters (DoubleMetaphone, Metaphone, Soundex,
>>> RefinedSoundex,
>>> Caverphone) is only for english language or works for other languages?
>>> Have
>>> some phonetic filter for portuguese? If dont have, how i can implement
>>> this?
>
> We did this, in another context, by using the open-source aspell library to
> handle the spell-checking for us. This has distinct advantages as aspell
> is well-tested, handles soundslike in a better manner at least IMHO, and
> supports a wide variety of languages, including Portugese.
>
> There are some drawbacks, as aspell only has C/C++ interfaces, and
> hence we built bindings on top of SWIG. Also, we handled the integration
> with Solr via a custom filter factory, though there are better ways to do
> this.
> Such a project would thus, have dependencies on aspell, and our custom
> code. If there is interest in this, we would be happy to open source this
> code: Given our current schedule this could take 2-3 weeks.
>
> Regards,
> Gora
>

Re: Phonetic search for portuguese

Posted by Gora Mohanty <go...@mimirtech.com>.
On Sun, Jan 22, 2012 at 5:47 PM, Anderson vasconcelos
<an...@gmail.com> wrote:
> Anyone could help?
>
> Thanks
>
> 2012/1/20, Anderson vasconcelos <an...@gmail.com>:
>> Hi
>>
>> The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
>> Caverphone) is only for english language or works for other languages? Have
>> some phonetic filter for portuguese? If dont have, how i can implement
>> this?

We did this, in another context, by using the open-source aspell library to
handle the spell-checking for us. This has distinct advantages as aspell
is well-tested, handles soundslike in a better manner at least IMHO, and
supports a wide variety of languages, including Portugese.

There are some drawbacks, as aspell only has C/C++ interfaces, and
hence we built bindings on top of SWIG. Also, we handled the integration
with Solr via a custom filter factory, though there are better ways to do this.
Such a project would thus, have dependencies on aspell, and our custom
code. If there is interest in this, we would be happy to open source this
code: Given our current schedule this could take 2-3 weeks.

Regards,
Gora

Re: Phonetic search for portuguese

Posted by Anderson vasconcelos <an...@gmail.com>.
Anyone could help?

Thanks

2012/1/20, Anderson vasconcelos <an...@gmail.com>:
> Hi
>
> The phonetic filters (DoubleMetaphone, Metaphone, Soundex, RefinedSoundex,
> Caverphone) is only for english language or works for other languages? Have
> some phonetic filter for portuguese? If dont have, how i can implement
> this?
>
> Thanks
>