You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by John Green <jo...@gmail.com> on 2014/08/21 21:08:50 UTC

Acronym annotator

Are there any acronym annotators and disambiguators? What are people doing
in production elsewhere? Im learning the heart of cTakes and UIMA by the
numbers right now and I think writing an annotator of my own will be the
best way to solidify the information. If no one has it done already, I
thought Id write a simple acronym annotator and disambiguator. The
disambiguation would just be a co-occurance over a lookup window across a
private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
word5 word6. I would provide specificity by excluding words that tend to
occur frequently across instances of the acronyms with the same
abbreviation.

But, if someone has already done it and is planning on releasing it, I hate
to reproduce wheels...

JG

Re: Acronym annotator

Posted by John Green <jo...@gmail.com>.

Wow, what a goldmine, thanks! 


JG
—
Sent from Mailbox for iPhone

On Fri, Aug 22, 2014 at 8:31 AM, Koola, Jejo David
<je...@vanderbilt.edu> wrote:

> You might be interested in: https://sbmi.uth.edu/ccb/resources/abbreviation.htm
> On Aug 21, 2014, at 2:08 PM, John Green <jo...@gmail.com>> wrote:
> Are there any acronym annotators and disambiguators? What are people doing
> in production elsewhere? Im learning the heart of cTakes and UIMA by the
> numbers right now and I think writing an annotator of my own will be the
> best way to solidify the information. If no one has it done already, I
> thought Id write a simple acronym annotator and disambiguator. The
> disambiguation would just be a co-occurance over a lookup window across a
> private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
> word5 word6. I would provide specificity by excluding words that tend to
> occur frequently across instances of the acronyms with the same
> abbreviation.
> But, if someone has already done it and is planning on releasing it, I hate
> to reproduce wheels...
> JG

Re: Acronym annotator

Posted by "Koola, Jejo David" <je...@vanderbilt.edu>.

You might be interested in: https://sbmi.uth.edu/ccb/resources/abbreviation.htm



On Aug 21, 2014, at 2:08 PM, John Green <jo...@gmail.com>> wrote:

Are there any acronym annotators and disambiguators? What are people doing
in production elsewhere? Im learning the heart of cTakes and UIMA by the
numbers right now and I think writing an annotator of my own will be the
best way to solidify the information. If no one has it done already, I
thought Id write a simple acronym annotator and disambiguator. The
disambiguation would just be a co-occurance over a lookup window across a
private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
word5 word6. I would provide specificity by excluding words that tend to
occur frequently across instances of the acronyms with the same
abbreviation.

But, if someone has already done it and is planning on releasing it, I hate
to reproduce wheels...

JG

Re: Acronym annotator

Posted by John Green <jo...@gmail.com>.

Thanks for the tip! Im so new to this field. By the by Vijay - your semantic similarity paper was outstanding, I enjoyed reading it very much.


Well, it doesnt have to be the best, just ok. Its mostly because I dont see one in the project now and its a project for me to solidify the process of making my own first annotator. I would like it to be useful though, so ill look into the literature.




JG
—
Sent from Mailbox for iPhone

On Fri, Aug 22, 2014 at 7:12 AM, vijay garla <vn...@gmail.com> wrote:

> This is a type of word sense disambiguation; there is a lot of literature
> on this subject.  Co-occurence is one way of doing it, not necessarily the
> best; you need a ton of annotated data for it to work well.
> On Thu, Aug 21, 2014 at 9:08 PM, John Green <jo...@gmail.com>
> wrote:
>> Are there any acronym annotators and disambiguators? What are people doing
>> in production elsewhere? Im learning the heart of cTakes and UIMA by the
>> numbers right now and I think writing an annotator of my own will be the
>> best way to solidify the information. If no one has it done already, I
>> thought Id write a simple acronym annotator and disambiguator. The
>> disambiguation would just be a co-occurance over a lookup window across a
>> private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
>> word5 word6. I would provide specificity by excluding words that tend to
>> occur frequently across instances of the acronyms with the same
>> abbreviation.
>>
>> But, if someone has already done it and is planning on releasing it, I hate
>> to reproduce wheels...
>>
>> JG
>>

Re: Acronym annotator

Posted by vijay garla <vn...@gmail.com>.

This is a type of word sense disambiguation; there is a lot of literature
on this subject.  Co-occurence is one way of doing it, not necessarily the
best; you need a ton of annotated data for it to work well.


On Thu, Aug 21, 2014 at 9:08 PM, John Green <jo...@gmail.com>
wrote:

> Are there any acronym annotators and disambiguators? What are people doing
> in production elsewhere? Im learning the heart of cTakes and UIMA by the
> numbers right now and I think writing an annotator of my own will be the
> best way to solidify the information. If no one has it done already, I
> thought Id write a simple acronym annotator and disambiguator. The
> disambiguation would just be a co-occurance over a lookup window across a
> private corpus I have access to, e.g., word1 word 2 word3 acronym1 word4
> word5 word6. I would provide specificity by excluding words that tend to
> occur frequently across instances of the acronyms with the same
> abbreviation.
>
> But, if someone has already done it and is planning on releasing it, I hate
> to reproduce wheels...
>
> JG
>