You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "Kline, Larry D" <La...@USONCOLOGY.COM> on 2012/12/13 22:32:36 UTC

ConceptMapper and stemming

I am about to try using the Porter stemmer with the ConceptMapper and
wonder if anyone has any experience with this.  Any suggestions,
caveats, etc. would be most welcome.

 

A couple questions:

*         I presume I will need to stem the lookup dictionary when I
build it.  Or can I do that at some other point in the pipeline?

*         I plan to use the Lucene implementation of Porter stemmer and
wrap it with a class that implements the interface required by
ConceptMapper.  Unless someone knows of a version of Porter stemmer that
already implements that interface?

*         Will I also need to stem the stop-words dictionary?

*         I see the following comment preceding the stem() method in
TokenNormalizer.  I assume this is not really true because the default
stemmer does not appear to be a Porter stemmer implementation.

   * If the stemming flag is true, then return the stemmed form of the
supplied word using the

   * Porter stemmer.

*         Is there anything else I should be aware of such as how this
might affect the search strategy?

*         Is it possible to get to the stemmed form of the word/phrase
that matched?  For instance could it be copied to the token?

*         Does anyone have experience with stemming medical terms?  I
would be running this against clinical notes typed by a physician about
a patient.  My dictionary was built from SNOMED concepts.  Will stemming
even help?  

 

Thanks,

Larry Kline

</pre>The contents of this electronic mail message and any attachments are confidential, possibly privileged and intended for the addressee(s) only.<br>Only the addressee(s) may read, disseminate, retain or otherwise use this message. If received in error, please immediately inform the sender and then delete this message without disclosing its contents to anyone.</pre>

Re: ConceptMapper and stemming

Posted by Michael Tanenblatt <sl...@park-slope.net>.
Well, you know, ConceptMapper is an open source project, so you (or someone) could extend it…


On Dec 21, 2012, at 1:01 PM, "Kline, Larry D" <La...@USONCOLOGY.COM> wrote:

> Thanks for the link to BioLemmatizer. I tried it but the problem with it
> is that in order to get accurate results you need to know the part of
> speech of the word you wish to lemmatize.  But ConceptMapper requires
> one to implement the Stemmer interface which allows you to pass only a
> String to the stem method.  No part of speech.
> 
> Larry
> 
> -----Original Message-----
> From: Renaud Richardet [mailto:renaud.richardet@gmail.com] 
> Sent: Tuesday, December 18, 2012 7:40 AM
> To: user@uima.apache.org
> Subject: Re: ConceptMapper and stemming
> 
> Hi Larry,
> 
>> *         I presume I will need to stem the lookup dictionary when I
>> build it.  Or can I do that at some other point in the pipeline?
> ConceptMapper will do that for you at initialize()
> 
> 
>> *         Does anyone have experience with stemming medical terms?  I
>> would be running this against clinical notes typed by a physician 
>> about a patient.  My dictionary was built from SNOMED concepts.  Will 
>> stemming even help?
> 
> There is a dedicated stemmer (actually, a lemmatizer) for the biomedical
> domain, you might want to take a look at it:
> http://biolemmatizer.sourceforge.net/
> 
> -- Renaud
> </pre>The contents of this electronic mail message and any attachments are confidential, possibly privileged and intended for the addressee(s) only.<br>Only the addressee(s) may read, disseminate, retain or otherwise use this message. If received in error, please immediately inform the sender and then delete this message without disclosing its contents to anyone.</pre>
> 


RE: ConceptMapper and stemming

Posted by "Kline, Larry D" <La...@USONCOLOGY.COM>.
Thanks for the link to BioLemmatizer. I tried it but the problem with it
is that in order to get accurate results you need to know the part of
speech of the word you wish to lemmatize.  But ConceptMapper requires
one to implement the Stemmer interface which allows you to pass only a
String to the stem method.  No part of speech.

Larry

-----Original Message-----
From: Renaud Richardet [mailto:renaud.richardet@gmail.com] 
Sent: Tuesday, December 18, 2012 7:40 AM
To: user@uima.apache.org
Subject: Re: ConceptMapper and stemming

Hi Larry,

> *         I presume I will need to stem the lookup dictionary when I
> build it.  Or can I do that at some other point in the pipeline?
ConceptMapper will do that for you at initialize()


> *         Does anyone have experience with stemming medical terms?  I
> would be running this against clinical notes typed by a physician 
> about a patient.  My dictionary was built from SNOMED concepts.  Will 
> stemming even help?

There is a dedicated stemmer (actually, a lemmatizer) for the biomedical
domain, you might want to take a look at it:
http://biolemmatizer.sourceforge.net/

-- Renaud
</pre>The contents of this electronic mail message and any attachments are confidential, possibly privileged and intended for the addressee(s) only.<br>Only the addressee(s) may read, disseminate, retain or otherwise use this message. If received in error, please immediately inform the sender and then delete this message without disclosing its contents to anyone.</pre>


Re: ConceptMapper and stemming

Posted by Renaud Richardet <re...@gmail.com>.
Hi Larry,

> *         I presume I will need to stem the lookup dictionary when I
> build it.  Or can I do that at some other point in the pipeline?
ConceptMapper will do that for you at initialize()


> *         Does anyone have experience with stemming medical terms?  I
> would be running this against clinical notes typed by a physician about
> a patient.  My dictionary was built from SNOMED concepts.  Will stemming
> even help?

There is a dedicated stemmer (actually, a lemmatizer) for the
biomedical domain, you might want to take a look at it:
http://biolemmatizer.sourceforge.net/

-- Renaud