You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Damiano Porta <da...@gmail.com> on 2017/07/01 18:04:19 UTC
Spelling correction
Hello everybody,
i am dealing with data normalization on very bad sentences with many
spelling errors.
Do you know a good paper to understand how to build a model that will fix
this kind of problem?
I can share the code without problems if you are interested in integrating
it into OpenNLP.
Thanks
Damiano
Re: Spelling correction
Posted by Daniel Russ <da...@gmail.com>.
Damiano,
There is a lot of research on spelling correction. Here is a paper from a group our of the National Library of Medicine
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137159/ <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2137159/>. They also have a product called GSpell
https://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/gSpell/current/GSpell.html <https://lexsrv3.nlm.nih.gov/LexSysGroup/Projects/gSpell/current/GSpell.html> which uses the NLM lexicon. It might not work of OpenNLP (too english-based) but things to look into. I dabble into the spelling correction field, but have not worked serious in it. I’d be willing to help on this project, but i don’t have a lot of time.
Daniel
> On Jul 1, 2017, at 7:20 PM, Suneel Marthi <sm...@apache.org> wrote:
>
> u could also leverage Language Models for spell correction, OpenNLP has
> stupid-backoff implementation - create a language model with that algorithm
> and use that for spell checks.
>
> On Sat, Jul 1, 2017 at 2:43 PM, Damiano Porta <da...@gmail.com>
> wrote:
>
>> I also read about Noisy channel. I could work on this if you think it is
>> good.
>>
>> Damiano
>>
>> Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:
>>
>>> 'Spelling Correction' has been the most popular ask from audience at my
>>> recent NLP talks, it would be great to have this feature in OpenNLP.
>>>
>>> I am not aware of any papers on this, but the first thing that comes to
>>> mind and is irrelevant is the 'Noisy channel'.
>>>
>>>
>>>
>>> On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
>>> wrote:
>>>
>>>> Hello everybody,
>>>> i am dealing with data normalization on very bad sentences with many
>>>> spelling errors.
>>>>
>>>> Do you know a good paper to understand how to build a model that will
>> fix
>>>> this kind of problem?
>>>> I can share the code without problems if you are interested in
>>> integrating
>>>> it into OpenNLP.
>>>>
>>>> Thanks
>>>> Damiano
>>>>
>>>
>>
Re: Spelling correction
Posted by Suneel Marthi <sm...@apache.org>.
u could also leverage Language Models for spell correction, OpenNLP has
stupid-backoff implementation - create a language model with that algorithm
and use that for spell checks.
On Sat, Jul 1, 2017 at 2:43 PM, Damiano Porta <da...@gmail.com>
wrote:
> I also read about Noisy channel. I could work on this if you think it is
> good.
>
> Damiano
>
> Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:
>
> > 'Spelling Correction' has been the most popular ask from audience at my
> > recent NLP talks, it would be great to have this feature in OpenNLP.
> >
> > I am not aware of any papers on this, but the first thing that comes to
> > mind and is irrelevant is the 'Noisy channel'.
> >
> >
> >
> > On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
> > wrote:
> >
> > > Hello everybody,
> > > i am dealing with data normalization on very bad sentences with many
> > > spelling errors.
> > >
> > > Do you know a good paper to understand how to build a model that will
> fix
> > > this kind of problem?
> > > I can share the code without problems if you are interested in
> > integrating
> > > it into OpenNLP.
> > >
> > > Thanks
> > > Damiano
> > >
> >
>
Re: Spelling correction
Posted by Suneel Marthi <sm...@apache.org>.
+1
On Sat, Jul 1, 2017 at 2:43 PM, Damiano Porta <da...@gmail.com>
wrote:
> I also read about Noisy channel. I could work on this if you think it is
> good.
>
> Damiano
>
> Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:
>
> > 'Spelling Correction' has been the most popular ask from audience at my
> > recent NLP talks, it would be great to have this feature in OpenNLP.
> >
> > I am not aware of any papers on this, but the first thing that comes to
> > mind and is irrelevant is the 'Noisy channel'.
> >
> >
> >
> > On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
> > wrote:
> >
> > > Hello everybody,
> > > i am dealing with data normalization on very bad sentences with many
> > > spelling errors.
> > >
> > > Do you know a good paper to understand how to build a model that will
> fix
> > > this kind of problem?
> > > I can share the code without problems if you are interested in
> > integrating
> > > it into OpenNLP.
> > >
> > > Thanks
> > > Damiano
> > >
> >
>
Re: Spelling correction
Posted by Damiano Porta <da...@gmail.com>.
I also read about Noisy channel. I could work on this if you think it is
good.
Damiano
Il 1 lug 2017 20:16, "Suneel Marthi" <su...@gmail.com> ha scritto:
> 'Spelling Correction' has been the most popular ask from audience at my
> recent NLP talks, it would be great to have this feature in OpenNLP.
>
> I am not aware of any papers on this, but the first thing that comes to
> mind and is irrelevant is the 'Noisy channel'.
>
>
>
> On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
> wrote:
>
> > Hello everybody,
> > i am dealing with data normalization on very bad sentences with many
> > spelling errors.
> >
> > Do you know a good paper to understand how to build a model that will fix
> > this kind of problem?
> > I can share the code without problems if you are interested in
> integrating
> > it into OpenNLP.
> >
> > Thanks
> > Damiano
> >
>
Re: Spelling correction
Posted by Suneel Marthi <su...@gmail.com>.
'Spelling Correction' has been the most popular ask from audience at my
recent NLP talks, it would be great to have this feature in OpenNLP.
I am not aware of any papers on this, but the first thing that comes to
mind and is irrelevant is the 'Noisy channel'.
On Sat, Jul 1, 2017 at 2:04 PM, Damiano Porta <da...@gmail.com>
wrote:
> Hello everybody,
> i am dealing with data normalization on very bad sentences with many
> spelling errors.
>
> Do you know a good paper to understand how to build a model that will fix
> this kind of problem?
> I can share the code without problems if you are interested in integrating
> it into OpenNLP.
>
> Thanks
> Damiano
>