You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Bruce Tietjen <br...@perfectsearchcorp.com> on 2014/10/08 17:37:41 UTC

Differences in MedicationMention annotations on subsequent processing runs

I have encountered a situation in which the cTakes clinical pipeline output
differs between multiple runs on the same text with the same configuration.

The following snippets from a single document are sufficient to demonstrate
the issue:

 a gentle curve going into. irrigated with Bacitracin.


The source of the difference is that the DictionaryLookupAnnotator uses a
map to filter out duplicate annotations for a single document location:

    // used to prevent duplicate hits
    // key = hit begin,end key (java.lang.String)
    // val = Set of MetaDataHit objects
    private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();


This map is shared between both the umls_ms_2011ab lookup and the
umls_ms_2011an_rxnorm lookup,

If both dictionaries contain the same term, the order of dictionary lookup
execution determines the output.If the rxnorm lookup runs first, then a
MedicationMention annotation for Bacitracin appears in the final output. If
the standard umls lookup runs first, then there is no MedicationMention
annotation for Bacitracin.

I will attach the output from the subsequent runs. (Hopefully the
attachment will make it through the system)

Is this expected behavior? If not, what would be the expected behavior?

 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

RE: Differences in MedicationMention annotations on subsequent processing runs

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
> DictionaryLookupAnnotator which is a container for the dictionaries and it iterates through the list of lookup dictionaries

I am confused.  The new dictionary-lookup-fast has neither this class nor multiple dictionaries.  The umls and rxnorm are in the same database table and lookup is performed in one swoop.  Could you please send a copy of your pipeline xmls to me directly (instead of bombing the group) with something other than an .xml extension (they get blocked)?


________________________________
From: Bruce Tietjen [bruce.tietjen@perfectsearchcorp.com]
Sent: Thursday, October 09, 2014 11:41 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

I tried the Dictionary-lookup-fast module and the bahavior is the same. I did have to run it a number of times before timing was right to reproduce the issue. With the older lookup, chances were about 50/50 between which dictionary ran first. Using the dictionary-fast, it seems more like 70/30 with the standard umls lookup being more likely to run first than not. Which means that most of the time, there is no MedicationMention annotation for Bacitracin.  (See Attached)

The code with the issue is the DictionaryLookupAnnotator which is a container for the dictionaries and it iterates through the list of lookup dictionaries so that part of the code path does not seem to have changed.

In the past, the rxNorm dictionary was a Lucene search and so I'm guessing it behaved a little differently than it does now with both being JDBC.

The fact that the filter is at this location seems to indicate that it may have been by intended for it to be across all dictionaries. On the other hand, it appears to mask out the lookups for the different dictionaries, resulting in some annotations not being made.

So, the real question is how should the filter work -- should the annotation filtering be per lookup dictionary, or be across all dictionaries? Or is there something wrong elsewhere that causes

I lean towards having the filter function per dictionary. This may risk having duplicate annotations, but that would probably be better than missing the annotation all together.







[IMAT Solutions]<http://imatsolutions.com>
Bruce Tietjen
Senior Software Engineer
[Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>

On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <Se...@childrens.harvard.edu>> wrote:
Hi Bruce,

With Pei's help I just updated the sourceforge repo with the cTakes dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab

Sean

-----Original Message-----
From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<ma...@perfectsearchcorp.com>]
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547<tel:801.634.1547>
bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu<ma...@childrens.harvard.edu>> wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<ma...@perfectsearchcorp.com>]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org<ma...@ctakes.apache.org>
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
>
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>
>     // used to prevent duplicate hits
>     // key = hit begin,end key (java.lang.String)
>     // val = Set of MetaDataHit objects
>     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> <http://imatsolutions.com>
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547>
> bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>
>


Re: Differences in MedicationMention annotations on subsequent processing runs

Posted by Bruce Tietjen <br...@perfectsearchcorp.com>.
Sorry, my mistake, it was still running the old dictionary lookups.

Since your earlier question, I have been trying to get the lookup-fast to
work and have not yet been successful.

I made the change to AgregatePlaintextUMLSProcessor.xml:

<!--
    <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
      <import
location="../../../ctakes-dictionary-lookup/desc/analysis_engine/DictionaryLookupAnnotatorUMLS.xml"/>
    </delegateAnalysisEngine>
-->

    <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
      <import
location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/>
    </delegateAnalysisEngine>



But I've been getting the following exception and trying to figure out why:

Caused by: org.apache.uima.resource.ResourceInitializationException: Could
not access the resource data at
file:org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml.
    at
org.apache.uima.resource.impl.DataResource_impl.initialize(DataResource_impl.java:127)
    at
org.apache.uima.util.SimpleResourceFactory.produceResource(SimpleResourceFactory.java:123)
    ... 31 more





 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

On Thu, Oct 9, 2014 at 11:42 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> I just ran the –fast with an example containing  bacitracin in four
> sentences, once being the first word and once being the last.  In ten of
> ten runs all four bacitracin mentions were discovered.
>
> You completely replaced the dictionary lookup with ?
>     <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
>       <import
> location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/>
>     </delegateAnalysisEngine>
>
>
> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> Sent: Thursday, October 09, 2014 11:42 AM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in MedicationMention annotations on subsequent
> processing runs
>
> I tried the Dictionary-lookup-fast module and the bahavior is the same. I
> did have to run it a number of times before timing was right to reproduce
> the issue. With the older lookup, chances were about 50/50 between which
> dictionary ran first. Using the dictionary-fast, it seems more like 70/30
> with the standard umls lookup being more likely to run first than not.
> Which means that most of the time, there is no MedicationMention annotation
> for Bacitracin.  (See Attached)
> The code with the issue is the DictionaryLookupAnnotator which is a
> container for the dictionaries and it iterates through the list of lookup
> dictionaries so that part of the code path does not seem to have changed.
> In the past, the rxNorm dictionary was a Lucene search and so I'm guessing
> it behaved a little differently than it does now with both being JDBC.
> The fact that the filter is at this location seems to indicate that it may
> have been by intended for it to be across all dictionaries. On the other
> hand, it appears to mask out the lookups for the different dictionaries,
> resulting in some annotations not being made.
>
> So, the real question is how should the filter work -- should the
> annotation filtering be per lookup dictionary, or be across all
> dictionaries? Or is there something wrong elsewhere that causes
> I lean towards having the filter function per dictionary. This may risk
> having duplicate annotations, but that would probably be better than
> missing the annotation all together.
>
>
>
>
>
> [IMAT Solutions]<http://imatsolutions.com>
> Bruce Tietjen
> Senior Software Engineer
> [Mobile:]801.634.1547
> bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>
>
> On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu<ma...@childrens.harvard.edu>>
> wrote:
> Hi Bruce,
>
> With Pei's help I just updated the sourceforge repo with the cTakes
> dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab
>
> Sean
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<mailto:
> bruce.tietjen@perfectsearchcorp.com>]
> Sent: Wednesday, October 08, 2014 11:52 AM
> To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
> Subject: Re: Differences in MedicationMention annotations on subsequent
> processing runs
>
> If I understand correctly, I would need new dictionary resources to run the
> rare word lookup method.
>
> Where can I find the necessary dictionary(ies) or how do I build them?
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>
>  Bruce Tietjen
> Senior Software Engineer
> [image: Mobile:] 801.634.1547<tel:801.634.1547>
> bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>
>
> On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu<ma...@childrens.harvard.edu>>
> wrote:
>
> >  Hi Bruce,
> >
> > I would venture to say that this is neither expected nor desired.
> >
> >
> >
> > Before you fix it (or in addition to a fix), try to run with the new
> > dictionary lookup.   It will have a different behavior, and it will be
> the
> > default dictionary lookup in future releases of cTakes – making fixes to
> > the current module slightly less urgent.
> >
> >
> >
> > Sean
> >
> >
> >
> > *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com
> <ma...@perfectsearchcorp.com>]
> > *Sent:* Wednesday, October 08, 2014 11:38 AM
> > *To:* dev@ctakes.apache.org<ma...@ctakes.apache.org>
> > *Subject:* Differences in MedicationMention annotations on subsequent
> > processing runs
> >
> >
> >
> >
> >
> > I have encountered a situation in which the cTakes clinical pipeline
> > output differs between multiple runs on the same text with the same
> > configuration.
> >
> > The following snippets from a single document are sufficient to
> > demonstrate the issue:
> >
> >  a gentle curve going into. irrigated with Bacitracin.
> >
> >
> >
> > The source of the difference is that the DictionaryLookupAnnotator uses a
> > map to filter out duplicate annotations for a single document location:
> >
> >     // used to prevent duplicate hits
> >     // key = hit begin,end key (java.lang.String)
> >     // val = Set of MetaDataHit objects
> >     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
> >
> >  This map is shared between both the umls_ms_2011ab lookup and the
> > umls_ms_2011an_rxnorm lookup,
> >
> >
> >
> > If both dictionaries contain the same term, the order of dictionary
> lookup
> > execution determines the output.If the rxnorm lookup runs first, then a
> > MedicationMention annotation for Bacitracin appears in the final output.
> If
> > the standard umls lookup runs first, then there is no MedicationMention
> > annotation for Bacitracin.
> >
> > I will attach the output from the subsequent runs. (Hopefully the
> > attachment will make it through the system)
> >
> >
> >
> > Is this expected behavior? If not, what would be the expected behavior?
> >
> >
> >
> > [image: Image removed by sender. IMAT Solutions]
> > <http://imatsolutions.com>
> >
> > *Bruce Tietjen*
> > Senior Software Engineer
> > [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547>
> > bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>
> >
>
>

RE: Differences in MedicationMention annotations on subsequent processing runs

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
I just ran the –fast with an example containing  bacitracin in four sentences, once being the first word and once being the last.  In ten of ten runs all four bacitracin mentions were discovered.

You completely replaced the dictionary lookup with ?
    <delegateAnalysisEngine key="DictionaryLookupAnnotatorDB">
      <import location="../../../ctakes-dictionary-lookup-fast/desc/analysis_engine/UmlsLookupAnnotator.xml"/>
    </delegateAnalysisEngine>


From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
Sent: Thursday, October 09, 2014 11:42 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

I tried the Dictionary-lookup-fast module and the bahavior is the same. I did have to run it a number of times before timing was right to reproduce the issue. With the older lookup, chances were about 50/50 between which dictionary ran first. Using the dictionary-fast, it seems more like 70/30 with the standard umls lookup being more likely to run first than not. Which means that most of the time, there is no MedicationMention annotation for Bacitracin.  (See Attached)
The code with the issue is the DictionaryLookupAnnotator which is a container for the dictionaries and it iterates through the list of lookup dictionaries so that part of the code path does not seem to have changed.
In the past, the rxNorm dictionary was a Lucene search and so I'm guessing it behaved a little differently than it does now with both being JDBC.
The fact that the filter is at this location seems to indicate that it may have been by intended for it to be across all dictionaries. On the other hand, it appears to mask out the lookups for the different dictionaries, resulting in some annotations not being made.

So, the real question is how should the filter work -- should the annotation filtering be per lookup dictionary, or be across all dictionaries? Or is there something wrong elsewhere that causes
I lean towards having the filter function per dictionary. This may risk having duplicate annotations, but that would probably be better than missing the annotation all together.





[IMAT Solutions]<http://imatsolutions.com>
Bruce Tietjen
Senior Software Engineer
[Mobile:]801.634.1547
bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>

On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <Se...@childrens.harvard.edu>> wrote:
Hi Bruce,

With Pei's help I just updated the sourceforge repo with the cTakes dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab

Sean

-----Original Message-----
From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<ma...@perfectsearchcorp.com>]
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org<ma...@ctakes.apache.org>
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547<tel:801.634.1547>
bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu<ma...@childrens.harvard.edu>> wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com<ma...@perfectsearchcorp.com>]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org<ma...@ctakes.apache.org>
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
>
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>
>     // used to prevent duplicate hits
>     // key = hit begin,end key (java.lang.String)
>     // val = Set of MetaDataHit objects
>     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> <http://imatsolutions.com>
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547<tel:801.634.1547>
> bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>
>


Re: Differences in MedicationMention annotations on subsequent processing runs

Posted by Bruce Tietjen <br...@perfectsearchcorp.com>.
I tried the Dictionary-lookup-fast module and the bahavior is the same. I
did have to run it a number of times before timing was right to reproduce
the issue. With the older lookup, chances were about 50/50 between which
dictionary ran first. Using the dictionary-fast, it seems more like 70/30
with the standard umls lookup being more likely to run first than not.
Which means that most of the time, there is no MedicationMention annotation
for Bacitracin.  (See Attached)

The code with the issue is the DictionaryLookupAnnotator which is a
container for the dictionaries and it iterates through the list of lookup
dictionaries so that part of the code path does not seem to have changed.

In the past, the rxNorm dictionary was a Lucene search and so I'm guessing
it behaved a little differently than it does now with both being JDBC.

The fact that the filter is at this location seems to indicate that it may
have been by intended for it to be across all dictionaries. On the other
hand, it appears to mask out the lookups for the different dictionaries,
resulting in some annotations not being made.

So, the real question is how should the filter work -- should the
annotation filtering be per lookup dictionary, or be across all
dictionaries? Or is there something wrong elsewhere that causes

I lean towards having the filter function per dictionary. This may risk
having duplicate annotations, but that would probably be better than
missing the annotation all together.







 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

On Wed, Oct 8, 2014 at 10:02 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Bruce,
>
> With Pei's help I just updated the sourceforge repo with the cTakes
> dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab
>
> Sean
>
> -----Original Message-----
> From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> Sent: Wednesday, October 08, 2014 11:52 AM
> To: dev@ctakes.apache.org
> Subject: Re: Differences in MedicationMention annotations on subsequent
> processing runs
>
> If I understand correctly, I would need new dictionary resources to run the
> rare word lookup method.
>
> Where can I find the necessary dictionary(ies) or how do I build them?
>
>
>  [image: IMAT Solutions] <http://imatsolutions.com>
>  Bruce Tietjen
> Senior Software Engineer
> [image: Mobile:] 801.634.1547
> bruce.tietjen@imatsolutions.com
>
> On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> >  Hi Bruce,
> >
> > I would venture to say that this is neither expected nor desired.
> >
> >
> >
> > Before you fix it (or in addition to a fix), try to run with the new
> > dictionary lookup.   It will have a different behavior, and it will be
> the
> > default dictionary lookup in future releases of cTakes – making fixes to
> > the current module slightly less urgent.
> >
> >
> >
> > Sean
> >
> >
> >
> > *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> > *Sent:* Wednesday, October 08, 2014 11:38 AM
> > *To:* dev@ctakes.apache.org
> > *Subject:* Differences in MedicationMention annotations on subsequent
> > processing runs
> >
> >
> >
> >
> >
> > I have encountered a situation in which the cTakes clinical pipeline
> > output differs between multiple runs on the same text with the same
> > configuration.
> >
> > The following snippets from a single document are sufficient to
> > demonstrate the issue:
> >
> >  a gentle curve going into. irrigated with Bacitracin.
> >
> >
> >
> > The source of the difference is that the DictionaryLookupAnnotator uses a
> > map to filter out duplicate annotations for a single document location:
> >
> >     // used to prevent duplicate hits
> >     // key = hit begin,end key (java.lang.String)
> >     // val = Set of MetaDataHit objects
> >     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
> >
> >  This map is shared between both the umls_ms_2011ab lookup and the
> > umls_ms_2011an_rxnorm lookup,
> >
> >
> >
> > If both dictionaries contain the same term, the order of dictionary
> lookup
> > execution determines the output.If the rxnorm lookup runs first, then a
> > MedicationMention annotation for Bacitracin appears in the final output.
> If
> > the standard umls lookup runs first, then there is no MedicationMention
> > annotation for Bacitracin.
> >
> > I will attach the output from the subsequent runs. (Hopefully the
> > attachment will make it through the system)
> >
> >
> >
> > Is this expected behavior? If not, what would be the expected behavior?
> >
> >
> >
> > [image: Image removed by sender. IMAT Solutions]
> > <http://imatsolutions.com>
> >
> > *Bruce Tietjen*
> > Senior Software Engineer
> > [image: Image removed by sender. Mobile:]801.634.1547
> > bruce.tietjen@imatsolutions.com
> >
>

RE: Differences in MedicationMention annotations on subsequent processing runs

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Bruce,

With Pei's help I just updated the sourceforge repo with the cTakes dictionaries.  Checkout artifact ctakes-resources-snomed-rword-hsqldb-2011ab

Sean

-----Original Message-----
From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com] 
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
>
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>
>     // used to prevent duplicate hits
>     // key = hit begin,end key (java.lang.String)
>     // val = Set of MetaDataHit objects
>     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> <http://imatsolutions.com>
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547
> bruce.tietjen@imatsolutions.com
>

RE: Differences in MedicationMention annotations on subsequent processing runs

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Good point ...
I tried to check in to sourceforge but had problems.  I will try again right now ...

Building a custom dictionary is possible with the DictionaryTool in cTakes sandbox, but that is a different rabbit hole.

-----Original Message-----
From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com] 
Sent: Wednesday, October 08, 2014 11:52 AM
To: dev@ctakes.apache.org
Subject: Re: Differences in MedicationMention annotations on subsequent processing runs

If I understand correctly, I would need new dictionary resources to run the rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions] <http://imatsolutions.com>  Bruce Tietjen Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes 
> to the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org
> *Subject:* Differences in MedicationMention annotations on subsequent 
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline 
> output differs between multiple runs on the same text with the same 
> configuration.
>
> The following snippets from a single document are sufficient to 
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator 
> uses a map to filter out duplicate annotations for a single document location:
>
>     // used to prevent duplicate hits
>     // key = hit begin,end key (java.lang.String)
>     // val = Set of MetaDataHit objects
>     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the 
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary 
> lookup execution determines the output.If the rxnorm lookup runs 
> first, then a MedicationMention annotation for Bacitracin appears in 
> the final output. If the standard umls lookup runs first, then there 
> is no MedicationMention annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the 
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions] 
> <http://imatsolutions.com>
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547 
> bruce.tietjen@imatsolutions.com
>

Re: Differences in MedicationMention annotations on subsequent processing runs

Posted by Bruce Tietjen <br...@perfectsearchcorp.com>.
If I understand correctly, I would need new dictionary resources to run the
rare word lookup method.

Where can I find the necessary dictionary(ies) or how do I build them?


 [image: IMAT Solutions] <http://imatsolutions.com>
 Bruce Tietjen
Senior Software Engineer
[image: Mobile:] 801.634.1547
bruce.tietjen@imatsolutions.com

On Wed, Oct 8, 2014 at 9:46 AM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

>  Hi Bruce,
>
> I would venture to say that this is neither expected nor desired.
>
>
>
> Before you fix it (or in addition to a fix), try to run with the new
> dictionary lookup.   It will have a different behavior, and it will be the
> default dictionary lookup in future releases of cTakes – making fixes to
> the current module slightly less urgent.
>
>
>
> Sean
>
>
>
> *From:* Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
> *Sent:* Wednesday, October 08, 2014 11:38 AM
> *To:* dev@ctakes.apache.org
> *Subject:* Differences in MedicationMention annotations on subsequent
> processing runs
>
>
>
>
>
> I have encountered a situation in which the cTakes clinical pipeline
> output differs between multiple runs on the same text with the same
> configuration.
>
> The following snippets from a single document are sufficient to
> demonstrate the issue:
>
>  a gentle curve going into. irrigated with Bacitracin.
>
>
>
> The source of the difference is that the DictionaryLookupAnnotator uses a
> map to filter out duplicate annotations for a single document location:
>
>     // used to prevent duplicate hits
>     // key = hit begin,end key (java.lang.String)
>     // val = Set of MetaDataHit objects
>     private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();
>
>  This map is shared between both the umls_ms_2011ab lookup and the
> umls_ms_2011an_rxnorm lookup,
>
>
>
> If both dictionaries contain the same term, the order of dictionary lookup
> execution determines the output.If the rxnorm lookup runs first, then a
> MedicationMention annotation for Bacitracin appears in the final output. If
> the standard umls lookup runs first, then there is no MedicationMention
> annotation for Bacitracin.
>
> I will attach the output from the subsequent runs. (Hopefully the
> attachment will make it through the system)
>
>
>
> Is this expected behavior? If not, what would be the expected behavior?
>
>
>
> [image: Image removed by sender. IMAT Solutions]
> <http://imatsolutions.com>
>
> *Bruce Tietjen*
> Senior Software Engineer
> [image: Image removed by sender. Mobile:]801.634.1547
> bruce.tietjen@imatsolutions.com
>

RE: Differences in MedicationMention annotations on subsequent processing runs

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Bruce,
I would venture to say that this is neither expected nor desired.

Before you fix it (or in addition to a fix), try to run with the new dictionary lookup.   It will have a different behavior, and it will be the default dictionary lookup in future releases of cTakes – making fixes to the current module slightly less urgent.

Sean

From: Bruce Tietjen [mailto:bruce.tietjen@perfectsearchcorp.com]
Sent: Wednesday, October 08, 2014 11:38 AM
To: dev@ctakes.apache.org
Subject: Differences in MedicationMention annotations on subsequent processing runs


I have encountered a situation in which the cTakes clinical pipeline output differs between multiple runs on the same text with the same configuration.
The following snippets from a single document are sufficient to demonstrate the issue:

 a gentle curve going into. irrigated with Bacitracin.

The source of the difference is that the DictionaryLookupAnnotator uses a map to filter out duplicate annotations for a single document location:
    // used to prevent duplicate hits
    // key = hit begin,end key (java.lang.String)
    // val = Set of MetaDataHit objects
    private Map<String,Set<MetaDataHit>> iv_dupMap = new HashMap<>();

This map is shared between both the umls_ms_2011ab lookup and the umls_ms_2011an_rxnorm lookup,

If both dictionaries contain the same term, the order of dictionary lookup execution determines the output.If the rxnorm lookup runs first, then a MedicationMention annotation for Bacitracin appears in the final output. If the standard umls lookup runs first, then there is no MedicationMention annotation for Bacitracin.
I will attach the output from the subsequent runs. (Hopefully the attachment will make it through the system)

Is this expected behavior? If not, what would be the expected behavior?

[Image removed by sender. IMAT Solutions]<http://imatsolutions.com>
Bruce Tietjen
Senior Software Engineer
[Image removed by sender. Mobile:]801.634.1547
bruce.tietjen@imatsolutions.com<ma...@imatsolutions.com>