You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Hugues de Mazancourt <hu...@mazancourt.com> on 2016/12/02 09:02:44 UTC

Re: New dictionary annotator

Thanks for this contribution.

Do you have any plan to make the lookup accent-insensitive ? Or any knowledge of a component that would do the job ?
I’m currently using ConceptMapper outside of Ruta and MARKTABLE from within Ruta but neither performs correctly on accents (btw, conceptMapper is *very* slow on resource loading, which can be a problem).

My point is : I have lists containing elements like « événement » and I would like text like « EVENEMENT » or even « évènement » to match that list. Lowercasing texts is not a solution, as « é » is mapped to uppercase « É » in French locale, which has nothing to do with « e ». I guess you have the same problem with latvian.

Best,


Hugues de Mazancourt
http://about.me/mazancourt




> Le 30 nov. 2016 à 15:38, Donatas Remeika <do...@gmail.com> a écrit :
> 
> Hi,
> 
> Just wanted to let you know that we created a new (probably one more)
> dictionary annotator.
> 
> Reasons for creating it was:
> - Quite often we used Ruta in our pipelines only because of its MARKTABLE
> action which is able to set several features on annotation
> - Sometimes dictionaries contain duplicate entries with different features
> and we need to create annotations for each entry
> - Possibility to use custom dictionary entries tokenizer (default is
> whitespace tokenizer)
> 
> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE. Big
> thanks to their developers!
> 
> Code with examples can be found
> https://github.com/tokenmill/dictionary-annotator
> 
> BTW, maybe someone knows Concept Mapper alternative, which is more uimaFIT
> friendly?
> 
> Best regards,
> Donatas

Re: New dictionary annotator

Posted by Hugues de Mazancourt <hu...@mazancourt.com>.

Great. Keep me informed, if you need a beta-tester !

— Hugues


> Le 2 déc. 2016 à 10:37, Donatas Remeika <do...@gmail.com> a écrit :
> 
> During the next week :)
> 
> Donatas
> 
> On Fri, Dec 2, 2016 at 11:32 AM Hugues de Mazancourt <hu...@mazancourt.com>
> wrote:
> 
>> Cool !
>> Any idea of how far that near future is ?
>> ;-)
>> 
>> — Hugues
>> 
>> 
>> 
>>> Le 2 déc. 2016 à 10:26, Donatas Remeika <do...@gmail.com> a
>> écrit :
>>> 
>>> Hi Hugues,
>>> 
>>> Thanks for feedback. Indeed accent-insensitive matching is a needed
>>> feature. Will implement it in a near future.
>>> 
>>> Best regards,
>>> Donatas Remeika
>>> 
>>> On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <
>> hugues@mazancourt.com>
>>> wrote:
>>> 
>>>> Thanks for this contribution.
>>>> 
>>>> Do you have any plan to make the lookup accent-insensitive ? Or any
>>>> knowledge of a component that would do the job ?
>>>> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
>>>> within Ruta but neither performs correctly on accents (btw,
>> conceptMapper
>>>> is *very* slow on resource loading, which can be a problem).
>>>> 
>>>> My point is : I have lists containing elements like « événement » and I
>>>> would like text like « EVENEMENT » or even « évènement » to match that
>>>> list. Lowercasing texts is not a solution, as « é » is mapped to
>> uppercase
>>>> « É » in French locale, which has nothing to do with « e ». I guess you
>>>> have the same problem with latvian.
>>>> 
>>>> Best,
>>>> 
>>>> 
>>>> Hugues de Mazancourt
>>>> http://about.me/mazancourt
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> Le 30 nov. 2016 à 15:38, Donatas Remeika <do...@gmail.com> a
>>>> écrit :
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Just wanted to let you know that we created a new (probably one more)
>>>>> dictionary annotator.
>>>>> 
>>>>> Reasons for creating it was:
>>>>> - Quite often we used Ruta in our pipelines only because of its
>> MARKTABLE
>>>>> action which is able to set several features on annotation
>>>>> - Sometimes dictionaries contain duplicate entries with different
>>>> features
>>>>> and we need to create annotations for each entry
>>>>> - Possibility to use custom dictionary entries tokenizer (default is
>>>>> whitespace tokenizer)
>>>>> 
>>>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
>>>> Big
>>>>> thanks to their developers!
>>>>> 
>>>>> Code with examples can be found
>>>>> https://github.com/tokenmill/dictionary-annotator
>>>>> 
>>>>> BTW, maybe someone knows Concept Mapper alternative, which is more
>>>> uimaFIT
>>>>> friendly?
>>>>> 
>>>>> Best regards,
>>>>> Donatas
>>>> 
>>>> 
>> 
>>

Re: New dictionary annotator

Posted by Donatas Remeika <do...@gmail.com>.

During the next week :)

Donatas

On Fri, Dec 2, 2016 at 11:32 AM Hugues de Mazancourt <hu...@mazancourt.com>
wrote:

> Cool !
> Any idea of how far that near future is ?
> ;-)
>
> — Hugues
>
>
>
> > Le 2 déc. 2016 à 10:26, Donatas Remeika <do...@gmail.com> a
> écrit :
> >
> > Hi Hugues,
> >
> > Thanks for feedback. Indeed accent-insensitive matching is a needed
> > feature. Will implement it in a near future.
> >
> > Best regards,
> > Donatas Remeika
> >
> > On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <
> hugues@mazancourt.com>
> > wrote:
> >
> >> Thanks for this contribution.
> >>
> >> Do you have any plan to make the lookup accent-insensitive ? Or any
> >> knowledge of a component that would do the job ?
> >> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
> >> within Ruta but neither performs correctly on accents (btw,
> conceptMapper
> >> is *very* slow on resource loading, which can be a problem).
> >>
> >> My point is : I have lists containing elements like « événement » and I
> >> would like text like « EVENEMENT » or even « évènement » to match that
> >> list. Lowercasing texts is not a solution, as « é » is mapped to
> uppercase
> >> « É » in French locale, which has nothing to do with « e ». I guess you
> >> have the same problem with latvian.
> >>
> >> Best,
> >>
> >>
> >> Hugues de Mazancourt
> >> http://about.me/mazancourt
> >>
> >>
> >>
> >>
> >>> Le 30 nov. 2016 à 15:38, Donatas Remeika <do...@gmail.com> a
> >> écrit :
> >>>
> >>> Hi,
> >>>
> >>> Just wanted to let you know that we created a new (probably one more)
> >>> dictionary annotator.
> >>>
> >>> Reasons for creating it was:
> >>> - Quite often we used Ruta in our pipelines only because of its
> MARKTABLE
> >>> action which is able to set several features on annotation
> >>> - Sometimes dictionaries contain duplicate entries with different
> >> features
> >>> and we need to create annotations for each entry
> >>> - Possibility to use custom dictionary entries tokenizer (default is
> >>> whitespace tokenizer)
> >>>
> >>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
> >> Big
> >>> thanks to their developers!
> >>>
> >>> Code with examples can be found
> >>> https://github.com/tokenmill/dictionary-annotator
> >>>
> >>> BTW, maybe someone knows Concept Mapper alternative, which is more
> >> uimaFIT
> >>> friendly?
> >>>
> >>> Best regards,
> >>> Donatas
> >>
> >>
>
>

Re: New dictionary annotator

Posted by Hugues de Mazancourt <hu...@mazancourt.com>.

Cool !
Any idea of how far that near future is ?
;-)

— Hugues



> Le 2 déc. 2016 à 10:26, Donatas Remeika <do...@gmail.com> a écrit :
> 
> Hi Hugues,
> 
> Thanks for feedback. Indeed accent-insensitive matching is a needed
> feature. Will implement it in a near future.
> 
> Best regards,
> Donatas Remeika
> 
> On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <hu...@mazancourt.com>
> wrote:
> 
>> Thanks for this contribution.
>> 
>> Do you have any plan to make the lookup accent-insensitive ? Or any
>> knowledge of a component that would do the job ?
>> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
>> within Ruta but neither performs correctly on accents (btw, conceptMapper
>> is *very* slow on resource loading, which can be a problem).
>> 
>> My point is : I have lists containing elements like « événement » and I
>> would like text like « EVENEMENT » or even « évènement » to match that
>> list. Lowercasing texts is not a solution, as « é » is mapped to uppercase
>> « É » in French locale, which has nothing to do with « e ». I guess you
>> have the same problem with latvian.
>> 
>> Best,
>> 
>> 
>> Hugues de Mazancourt
>> http://about.me/mazancourt
>> 
>> 
>> 
>> 
>>> Le 30 nov. 2016 à 15:38, Donatas Remeika <do...@gmail.com> a
>> écrit :
>>> 
>>> Hi,
>>> 
>>> Just wanted to let you know that we created a new (probably one more)
>>> dictionary annotator.
>>> 
>>> Reasons for creating it was:
>>> - Quite often we used Ruta in our pipelines only because of its MARKTABLE
>>> action which is able to set several features on annotation
>>> - Sometimes dictionaries contain duplicate entries with different
>> features
>>> and we need to create annotations for each entry
>>> - Possibility to use custom dictionary entries tokenizer (default is
>>> whitespace tokenizer)
>>> 
>>> It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
>> Big
>>> thanks to their developers!
>>> 
>>> Code with examples can be found
>>> https://github.com/tokenmill/dictionary-annotator
>>> 
>>> BTW, maybe someone knows Concept Mapper alternative, which is more
>> uimaFIT
>>> friendly?
>>> 
>>> Best regards,
>>> Donatas
>> 
>>

Re: New dictionary annotator

Posted by Donatas Remeika <do...@gmail.com>.

Hi Hugues,

Thanks for feedback. Indeed accent-insensitive matching is a needed
feature. Will implement it in a near future.

Best regards,
Donatas Remeika

On Fri, Dec 2, 2016 at 11:02 AM Hugues de Mazancourt <hu...@mazancourt.com>
wrote:

> Thanks for this contribution.
>
> Do you have any plan to make the lookup accent-insensitive ? Or any
> knowledge of a component that would do the job ?
> I’m currently using ConceptMapper outside of Ruta and MARKTABLE from
> within Ruta but neither performs correctly on accents (btw, conceptMapper
> is *very* slow on resource loading, which can be a problem).
>
> My point is : I have lists containing elements like « événement » and I
> would like text like « EVENEMENT » or even « évènement » to match that
> list. Lowercasing texts is not a solution, as « é » is mapped to uppercase
> « É » in French locale, which has nothing to do with « e ». I guess you
> have the same problem with latvian.
>
> Best,
>
>
> Hugues de Mazancourt
> http://about.me/mazancourt
>
>
>
>
> > Le 30 nov. 2016 à 15:38, Donatas Remeika <do...@gmail.com> a
> écrit :
> >
> > Hi,
> >
> > Just wanted to let you know that we created a new (probably one more)
> > dictionary annotator.
> >
> > Reasons for creating it was:
> > - Quite often we used Ruta in our pipelines only because of its MARKTABLE
> > action which is able to set several features on annotation
> > - Sometimes dictionaries contain duplicate entries with different
> features
> > and we need to create annotations for each entry
> > - Possibility to use custom dictionary entries tokenizer (default is
> > whitespace tokenizer)
> >
> > It was inspired by both DKPro dictionary-annotator and Ruta MARKTABLE.
> Big
> > thanks to their developers!
> >
> > Code with examples can be found
> > https://github.com/tokenmill/dictionary-annotator
> >
> > BTW, maybe someone knows Concept Mapper alternative, which is more
> uimaFIT
> > friendly?
> >
> > Best regards,
> > Donatas
>
>