You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Damiano Porta <da...@gmail.com> on 2017/07/01 18:04:19 UTC

Spelling correction

Hello everybody,
i am dealing with data normalization on very bad sentences with many
spelling errors.

Do you know a good paper to understand how to build a model that will fix
this kind of problem?
I can share the code without problems if you are interested in integrating
it into OpenNLP.

Thanks
Damiano

Re: Spelling correction

Posted by Daniel Russ <da...@gmail.com>.
Damiano,

    There is a lot of research on spelling correction.  Here is a paper from a group our of the National Library of Medicine
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137159/ <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137159/>.   They also have a product called GSpell 
https://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/gSpell/current/GSpell.html <https://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/gSpell/current/GSpell.html> which uses the NLM lexicon.  It might not work of OpenNLP (too english-based) but things to look into.  I dabble into the spelling correction field, but have not worked serious in it.  I’d be willing to help on this project, but i don’t have a lot of time.

Daniel


> On Jul 1, 2017, at 7:20 PM, Suneel Marthi <sm...@apache.org> wrote:
> 
> u could also leverage Language Models for spell correction, OpenNLP has
> stupid-backoff implementation - create a language model with that algorithm
> and use that for spell checks.
> 
> On Sat, Jul 1, 2017 at 2:43 PM, Damiano Porta <da...@gmail.com>
> wrote:
> 
>> I also read about Noisy channel. I could work on this if you think it is
>> good.
>> 
>> Damiano
>> 
>> Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:
>> 
>>> 'Spelling Correction' has been the most popular ask from audience at my
>>> recent NLP talks, it would be great to have this feature in OpenNLP.
>>> 
>>> I am not aware of any papers on this, but the first thing that comes to
>>> mind and is irrelevant is the 'Noisy channel'.
>>> 
>>> 
>>> 
>>> On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
>>> wrote:
>>> 
>>>> Hello everybody,
>>>> i am dealing with data normalization on very bad sentences with many
>>>> spelling errors.
>>>> 
>>>> Do you know a good paper to understand how to build a model that will
>> fix
>>>> this kind of problem?
>>>> I can share the code without problems if you are interested in
>>> integrating
>>>> it into OpenNLP.
>>>> 
>>>> Thanks
>>>> Damiano
>>>> 
>>> 
>> 


Re: Spelling correction

Posted by Suneel Marthi <sm...@apache.org>.
u could also leverage Language Models for spell correction, OpenNLP has
stupid-backoff implementation - create a language model with that algorithm
and use that for spell checks.

On Sat, Jul 1, 2017 at 2:43 PM, Damiano Porta <da...@gmail.com>
wrote:

> I also read about Noisy channel. I could work on this if you think it is
> good.
>
> Damiano
>
> Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:
>
> > 'Spelling Correction' has been the most popular ask from audience at my
> > recent NLP talks, it would be great to have this feature in OpenNLP.
> >
> > I am not aware of any papers on this, but the first thing that comes to
> > mind and is irrelevant is the 'Noisy channel'.
> >
> >
> >
> > On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
> > wrote:
> >
> > > Hello everybody,
> > > i am dealing with data normalization on very bad sentences with many
> > > spelling errors.
> > >
> > > Do you know a good paper to understand how to build a model that will
> fix
> > > this kind of problem?
> > > I can share the code without problems if you are interested in
> > integrating
> > > it into OpenNLP.
> > >
> > > Thanks
> > > Damiano
> > >
> >
>

Re: Spelling correction

Posted by Suneel Marthi <sm...@apache.org>.
+1

On Sat, Jul 1, 2017 at 2:43 PM, Damiano Porta <da...@gmail.com>
wrote:

> I also read about Noisy channel. I could work on this if you think it is
> good.
>
> Damiano
>
> Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:
>
> > 'Spelling Correction' has been the most popular ask from audience at my
> > recent NLP talks, it would be great to have this feature in OpenNLP.
> >
> > I am not aware of any papers on this, but the first thing that comes to
> > mind and is irrelevant is the 'Noisy channel'.
> >
> >
> >
> > On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
> > wrote:
> >
> > > Hello everybody,
> > > i am dealing with data normalization on very bad sentences with many
> > > spelling errors.
> > >
> > > Do you know a good paper to understand how to build a model that will
> fix
> > > this kind of problem?
> > > I can share the code without problems if you are interested in
> > integrating
> > > it into OpenNLP.
> > >
> > > Thanks
> > > Damiano
> > >
> >
>

Re: Spelling correction

Posted by Damiano Porta <da...@gmail.com>.
I also read about Noisy channel. I could work on this if you think it is
good.

Damiano

Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:

> 'Spelling Correction' has been the most popular ask from audience at my
> recent NLP talks, it would be great to have this feature in OpenNLP.
>
> I am not aware of any papers on this, but the first thing that comes to
> mind and is irrelevant is the 'Noisy channel'.
>
>
>
> On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
> wrote:
>
> > Hello everybody,
> > i am dealing with data normalization on very bad sentences with many
> > spelling errors.
> >
> > Do you know a good paper to understand how to build a model that will fix
> > this kind of problem?
> > I can share the code without problems if you are interested in
> integrating
> > it into OpenNLP.
> >
> > Thanks
> > Damiano
> >
>

Re: Spelling correction

Posted by Suneel Marthi <su...@gmail.com>.
'Spelling Correction' has been the most popular ask from audience at my
recent NLP talks, it would be great to have this feature in OpenNLP.

I am not aware of any papers on this, but the first thing that comes to
mind and is irrelevant is the 'Noisy channel'.



On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
wrote:

> Hello everybody,
> i am dealing with data normalization on very bad sentences with many
> spelling errors.
>
> Do you know a good paper to understand how to build a model that will fix
> this kind of problem?
> I can share the code without problems if you are interested in integrating
> it into OpenNLP.
>
> Thanks
> Damiano
>