You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Mullane, Sean *HS" <SP...@hscmail.mcc.virginia.edu> on 2016/12/07 20:29:42 UTC

RE: Allergy Annotator

I'm reviving this thread with reference to negation detection. I previously posted about this to the User list but this is probably a more appropriate venue.

The way the sentences are split on ":" makes the negation annotator miss negation in lists of this form:

Hyperlipidemia:  Yes
Hypercholesterolemia:  No
Chronic Renal Insufficiency:  N/A

I tried reversing order and removing ":"s and found that the negation for Hypercholesterolemia is detected when in this form:

Yes Hyperlipidemia
No Hypercholesterolemia
N/A Chronic Renal Insufficiency

Our notes have quite a few places with this sort of list where good negation detection is important but I haven't very good results. The sentence segmentator sees this as 12 separate sentences, but I would think proper behavior would be to consider this as 6 sentences (breaking sentences on line break but not on colons). I see previous discussion on the list about the sentence segmentator breaking on newlines but little regarding colons. I would think in most cases it would be more useful not to break on ":". Or is there an overriding reason for the current behavior?
If changing the sentence segmentator isn't an option is there a different way to configure the negation detection annotator that would avoid this issue?

Thanks,
Sean



Hi,

I am interested in the design decision of the sentence detector.

Why does it split a sentence of the form "WORD1: WORD2 WORD3." into two
sentences "WORD1:" and "WORD2 WORD3."? Do other components of cTAKES require
such a sentence splitting?

It would seem to me that it should remain one sentence. For example, the smoking
status detector has its own SentenceAdjuster that merges some of such sentences
back into one, because of this design.

Thanks, Tomasz

________________________________________ From: Finan, Sean [Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To: de...@ctakes.apache.org Subject: RE: Allergy Annotator

Hi Tom,

It is exactly because the sentence detector splits "KEY:" from "VALUE" that I
didn't suggest using sentences. Instead, I would just iterate over the whole
cas collection of medication events and attempt to match allergy phrases
("allergic to medication") with text the note spanning from event.begin-15 to
event.end+15 or whatever window size you prefer.

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject: Re: Allergy Annotator

Sean and Dima, these are great suggestions, thanks so far.

Sean, when looping over medication events as you say, I can see how it is
possible to take the textspan.Sentence of this MedicationMention, and then do a
regex check for the phrase structure as Dima said.

But instead of textspan.Sentence, you mention "see any is included in a phrase".
What cTAKES/UIMA class is related to this?

Because if I would use textspan.Sentence, it would work for "The patient is
allergic to penicillin.", but cTAKES splits "ALLERGIES: PENICILLIN, WHEAT" into two sentences, so that the MedicationMentions here would not be in the same
sentence as the word "ALLERGIES".

Thanks again, Tom

On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < Sean...@childrens.harvard.edu>
wrote:

Hi Dima, Tom,

I was thinking the same as Dima's first solution. Iterate through the medication events and see any is included in a phrase as mentioned in Tom's original email. Each phrase structure would have to be specified beforehand. However, assigning appropriate CUIs would require having a lookup table for each medication allergy. I think that would be the simplest solution.

Sean

-----Original Message----- From: Dligach, Dmitriy [mailto:Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To: cTAKES Developer list Subject: Re: Allergy Annotator

Hi Tom,

If the patters are pretty simple, you could just add a few rules on top of the cTAKES dictionary lookup output. Something of the kind "allergic to <medication>" or "allergies: <medication1>, <medication2>, <substance1>, ...".

If these patterns are hard to express as rules, you should consider a machine learning based sequence labeling route (e.g. something similar to the cTAKES chunker).

Dima

-- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and Harvard Medical School (617) 651-0397

On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto: deve...@gmail.com>> wrote:

Sean,

It would be a wider net, such that if an allergy is mentioned in the clinical note, this is captured in the corresponding IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation class should not be changed with a new attribute, in a separate allergy
annotation).

This annotator would then have to of course run after the clinical pipeline has run and discovered all IdentifiedAnnotations.

I am familiar with writing UIMA/cTAKES annotators, but not sure how a new ML method could be integrated here for detecting allergies. Do you have any thoughts about how to approach this in general?

Thanks, Tom

On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e du>> wrote:

Hi Tom,

Are you interested in catching all allergies or just a few specific allergies for a study? If you are only concerned with a few then there is a (possibly) simple solution. If you are interested in throwing a wider net then I think that a new module would need to be created; does anybody reading this have an ML or regex style module?

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<ma...@ctakes.apache.org> Subject: Allergy Annotator

Hi,

I would like to use/extend cTAKES to detect allergies.

In the cTAKES publication (2010)

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a given medication are handled by setting the negation attribute of that medication to 'is negated'."

However, in a post here in 2014 (RE: Allergy Indication) it is said that cTAKES does not have a module for allergy discovery.

1. What is the current status of allergy detection in cTAKES?

2. I did some testing, while cTAKES discovers concepts about allegies ("wheat allergy" is found as C0949570), using "ALLERGIES: PENICILLIN, WHEAT" or "The patient is allergic to penicillin." does not give penicillin or wheat annotations allergy status.

How would I go about detecting these allergy mentions?

Thanks, Tom


RE: Allergy Annotator

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Sean,

You can still use the original Negation detector.  To do so, just point to the NegationAnnotator.xml in your descriptor xml or if you are using UimaFit to create your pipeline, add ContextAnnotator.class in your aggregateBuilder.

I hope this helps you start,
Pgh Sean

-----Original Message-----
From: Mullane, Sean *HS [mailto:SPM9R@hscmail.mcc.virginia.edu] 
Sent: Thursday, December 08, 2016 5:16 PM
To: dev@ctakes.apache.org
Subject: RE: Allergy Annotator

Sean,

Thanks for sending that over, looks like a great start. I will have to spend some time getting acquainted with the dev aspects of cTAKES as I have so far been mostly been a user of it. 

Can you tell me if the information under the "NegationAnnotator.xml" and "Updating Negex Patterns" headings on this page is still current?

Thanks,
Sean

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Wednesday, December 07, 2016 3:49 PM
To: dev@ctakes.apache.org
Subject: RE: Allergy Annotator

Hi Sean,

Even with a change to your sentence detection you may need to change your negation annotator.

As a quick change, you can add an annotator to deal specifically with your situation.  It can be simpler or more elaborate, but something like this:


   static private final Pattern NEGATIVE_PATTERN
         = Pattern.compile( "(?:\\s?:\\s*)(?:NEGATIVE|(?:NO\\.?\\b)|NONE|(?:NOT (?:SEEN|PRESENT|INDICATED|FOUND|DISCOVERED)?))",
         Pattern.CASE_INSENSITIVE );

   /**
    * Finds list-style negations
    * {@inheritDoc}
    */
   @Override
   public void process( final JCas jcas ) throws AnalysisEngineProcessException {
      LOGGER.info( "Starting Processing" );
      final Collection<DiseaseDisorderMention> diseases = JCasUtil.select( jcas, DiseaseDisorderMention.class );
      if ( !diseases.isEmpty() ) {
         processType( jcas, diseases );
      }
      final Collection<SignSymptomMention> findings = JCasUtil.select( jcas, SignSymptomMention.class );
      if ( !findings.isEmpty() ) {
         processType( jcas, findings );
      }
      LOGGER.info( "Finished Processing" );
   }

   static private void processType( final JCas jcas, final Collection<? extends IdentifiedAnnotation> annotations ) {
      final String docText = jcas.getDocumentText();
      for ( IdentifiedAnnotation annotation : annotations ) {
         String window;
         final int annotationEnd = annotation.getEnd();
         final int maxEnd = Math.min( docText.length(), annotationEnd + 60 );
         final List<Sentence> covering = JCasUtil.selectCovering( jcas, Sentence.class, annotation );
         if ( covering == null || covering.isEmpty() ) {
            LOGGER.warn( "Identified Annotation spans not within a Sentence : " + annotation.getCoveredText() );
            window = docText.substring( annotationEnd, maxEnd );
         } else if ( covering.size() > 1 ) {
            LOGGER.warn( DocumentIDAnnotationUtil.getDocumentID( jcas ) );
            LOGGER.warn( "Identified Annotation spans " + covering.size() + " Sentences : " + annotation.getCoveredText() );
            final int sentencesEnd = covering.stream().mapToInt( Sentence::getEnd ).max().orElse( maxEnd );
            window = docText.substring( annotationEnd, sentencesEnd );
//            covering.stream().map( Sentence::getCoveredText ).forEach( LOGGER::warn );
         } else {
            window = docText.substring( annotationEnd, covering.get( 0 ).getEnd() );
         }
         final Matcher matcher = NEGATIVE_PATTERN.matcher( window );
         if ( matcher.find() ) {
            annotation.setPolarity( CONST.NE_POLARITY_NEGATION_PRESENT );
         }
      }
   }

-----Original Message-----
From: Mullane, Sean *HS [mailto:SPM9R@hscmail.mcc.virginia.edu] 
Sent: Wednesday, December 07, 2016 3:30 PM
To: 'Tomasz Oliwa'
Cc: 'dev@ctakes.apache.org'
Subject: RE: Allergy Annotator

I'm reviving this thread with reference to negation detection. I previously posted about this to the User list but this is probably a more appropriate venue.

The way the sentences are split on ":" makes the negation annotator miss negation in lists of this form:

Hyperlipidemia:  Yes
Hypercholesterolemia:  No
Chronic Renal Insufficiency:  N/A

I tried reversing order and removing ":"s and found that the negation for Hypercholesterolemia is detected when in this form:

Yes Hyperlipidemia
No Hypercholesterolemia
N/A Chronic Renal Insufficiency

Our notes have quite a few places with this sort of list where good negation detection is important but I haven't very good results. The sentence segmentator sees this as 12 separate sentences, but I would think proper behavior would be to consider this as 6 sentences (breaking sentences on line break but not on colons). I see previous discussion on the list about the sentence segmentator breaking on newlines but little regarding colons. I would think in most cases it would be more useful not to break on ":". Or is there an overriding reason for the current behavior?
If changing the sentence segmentator isn't an option is there a different way to configure the negation detection annotator that would avoid this issue?

Thanks,
Sean



Hi,

I am interested in the design decision of the sentence detector.

Why does it split a sentence of the form "WORD1: WORD2 WORD3." into two sentences "WORD1:" and "WORD2 WORD3."? Do other components of cTAKES require such a sentence splitting?

It would seem to me that it should remain one sentence. For example, the smoking status detector has its own SentenceAdjuster that merges some of such sentences back into one, because of this design.

Thanks, Tomasz

________________________________________ From: Finan, Sean [Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To: de...@ctakes.apache.org Subject: RE: Allergy Annotator

Hi Tom,

It is exactly because the sentence detector splits "KEY:" from "VALUE" that I didn't suggest using sentences. Instead, I would just iterate over the whole cas collection of medication events and attempt to match allergy phrases ("allergic to medication") with text the note spanning from event.begin-15 to
event.end+15 or whatever window size you prefer.

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject: Re: Allergy Annotator

Sean and Dima, these are great suggestions, thanks so far.

Sean, when looping over medication events as you say, I can see how it is possible to take the textspan.Sentence of this MedicationMention, and then do a regex check for the phrase structure as Dima said.

But instead of textspan.Sentence, you mention "see any is included in a phrase".
What cTAKES/UIMA class is related to this?

Because if I would use textspan.Sentence, it would work for "The patient is allergic to penicillin.", but cTAKES splits "ALLERGIES: PENICILLIN, WHEAT" into two sentences, so that the MedicationMentions here would not be in the same sentence as the word "ALLERGIES".

Thanks again, Tom

On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < Sean...@childrens.harvard.edu>
wrote:

Hi Dima, Tom,

I was thinking the same as Dima's first solution. Iterate through the medication events and see any is included in a phrase as mentioned in Tom's original email. Each phrase structure would have to be specified beforehand. However, assigning appropriate CUIs would require having a lookup table for each medication allergy. I think that would be the simplest solution.

Sean

-----Original Message----- From: Dligach, Dmitriy [mailto:Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To: cTAKES Developer list Subject: Re: Allergy Annotator

Hi Tom,

If the patters are pretty simple, you could just add a few rules on top of the cTAKES dictionary lookup output. Something of the kind "allergic to <medication>" or "allergies: <medication1>, <medication2>, <substance1>, ...".

If these patterns are hard to express as rules, you should consider a machine learning based sequence labeling route (e.g. something similar to the cTAKES chunker).

Dima

-- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and Harvard Medical School (617) 651-0397

On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto: deve...@gmail.com>> wrote:

Sean,

It would be a wider net, such that if an allergy is mentioned in the clinical note, this is captured in the corresponding IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation class should not be changed with a new attribute, in a separate allergy annotation).

This annotator would then have to of course run after the clinical pipeline has run and discovered all IdentifiedAnnotations.

I am familiar with writing UIMA/cTAKES annotators, but not sure how a new ML method could be integrated here for detecting allergies. Do you have any thoughts about how to approach this in general?

Thanks, Tom

On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e du>> wrote:

Hi Tom,

Are you interested in catching all allergies or just a few specific allergies for a study? If you are only concerned with a few then there is a (possibly) simple solution. If you are interested in throwing a wider net then I think that a new module would need to be created; does anybody reading this have an ML or regex style module?

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<ma...@ctakes.apache.org> Subject: Allergy Annotator

Hi,

I would like to use/extend cTAKES to detect allergies.

In the cTAKES publication (2010)

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a given medication are handled by setting the negation attribute of that medication to 'is negated'."

However, in a post here in 2014 (RE: Allergy Indication) it is said that cTAKES does not have a module for allergy discovery.

1. What is the current status of allergy detection in cTAKES?

2. I did some testing, while cTAKES discovers concepts about allegies ("wheat allergy" is found as C0949570), using "ALLERGIES: PENICILLIN, WHEAT" or "The patient is allergic to penicillin." does not give penicillin or wheat annotations allergy status.

How would I go about detecting these allergy mentions?

Thanks, Tom



RE: Allergy Annotator

Posted by "Mullane, Sean *HS" <SP...@hscmail.mcc.virginia.edu>.
Sean,

Thanks for sending that over, looks like a great start. I will have to spend some time getting acquainted with the dev aspects of cTAKES as I have so far been mostly been a user of it. 

Can you tell me if the information under the "NegationAnnotator.xml" and "Updating Negex Patterns" headings on this page is still current?

Thanks,
Sean

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Wednesday, December 07, 2016 3:49 PM
To: dev@ctakes.apache.org
Subject: RE: Allergy Annotator

Hi Sean,

Even with a change to your sentence detection you may need to change your negation annotator.

As a quick change, you can add an annotator to deal specifically with your situation.  It can be simpler or more elaborate, but something like this:


   static private final Pattern NEGATIVE_PATTERN
         = Pattern.compile( "(?:\\s?:\\s*)(?:NEGATIVE|(?:NO\\.?\\b)|NONE|(?:NOT (?:SEEN|PRESENT|INDICATED|FOUND|DISCOVERED)?))",
         Pattern.CASE_INSENSITIVE );

   /**
    * Finds list-style negations
    * {@inheritDoc}
    */
   @Override
   public void process( final JCas jcas ) throws AnalysisEngineProcessException {
      LOGGER.info( "Starting Processing" );
      final Collection<DiseaseDisorderMention> diseases = JCasUtil.select( jcas, DiseaseDisorderMention.class );
      if ( !diseases.isEmpty() ) {
         processType( jcas, diseases );
      }
      final Collection<SignSymptomMention> findings = JCasUtil.select( jcas, SignSymptomMention.class );
      if ( !findings.isEmpty() ) {
         processType( jcas, findings );
      }
      LOGGER.info( "Finished Processing" );
   }

   static private void processType( final JCas jcas, final Collection<? extends IdentifiedAnnotation> annotations ) {
      final String docText = jcas.getDocumentText();
      for ( IdentifiedAnnotation annotation : annotations ) {
         String window;
         final int annotationEnd = annotation.getEnd();
         final int maxEnd = Math.min( docText.length(), annotationEnd + 60 );
         final List<Sentence> covering = JCasUtil.selectCovering( jcas, Sentence.class, annotation );
         if ( covering == null || covering.isEmpty() ) {
            LOGGER.warn( "Identified Annotation spans not within a Sentence : " + annotation.getCoveredText() );
            window = docText.substring( annotationEnd, maxEnd );
         } else if ( covering.size() > 1 ) {
            LOGGER.warn( DocumentIDAnnotationUtil.getDocumentID( jcas ) );
            LOGGER.warn( "Identified Annotation spans " + covering.size() + " Sentences : " + annotation.getCoveredText() );
            final int sentencesEnd = covering.stream().mapToInt( Sentence::getEnd ).max().orElse( maxEnd );
            window = docText.substring( annotationEnd, sentencesEnd );
//            covering.stream().map( Sentence::getCoveredText ).forEach( LOGGER::warn );
         } else {
            window = docText.substring( annotationEnd, covering.get( 0 ).getEnd() );
         }
         final Matcher matcher = NEGATIVE_PATTERN.matcher( window );
         if ( matcher.find() ) {
            annotation.setPolarity( CONST.NE_POLARITY_NEGATION_PRESENT );
         }
      }
   }

-----Original Message-----
From: Mullane, Sean *HS [mailto:SPM9R@hscmail.mcc.virginia.edu] 
Sent: Wednesday, December 07, 2016 3:30 PM
To: 'Tomasz Oliwa'
Cc: 'dev@ctakes.apache.org'
Subject: RE: Allergy Annotator

I'm reviving this thread with reference to negation detection. I previously posted about this to the User list but this is probably a more appropriate venue.

The way the sentences are split on ":" makes the negation annotator miss negation in lists of this form:

Hyperlipidemia:  Yes
Hypercholesterolemia:  No
Chronic Renal Insufficiency:  N/A

I tried reversing order and removing ":"s and found that the negation for Hypercholesterolemia is detected when in this form:

Yes Hyperlipidemia
No Hypercholesterolemia
N/A Chronic Renal Insufficiency

Our notes have quite a few places with this sort of list where good negation detection is important but I haven't very good results. The sentence segmentator sees this as 12 separate sentences, but I would think proper behavior would be to consider this as 6 sentences (breaking sentences on line break but not on colons). I see previous discussion on the list about the sentence segmentator breaking on newlines but little regarding colons. I would think in most cases it would be more useful not to break on ":". Or is there an overriding reason for the current behavior?
If changing the sentence segmentator isn't an option is there a different way to configure the negation detection annotator that would avoid this issue?

Thanks,
Sean



Hi,

I am interested in the design decision of the sentence detector.

Why does it split a sentence of the form "WORD1: WORD2 WORD3." into two sentences "WORD1:" and "WORD2 WORD3."? Do other components of cTAKES require such a sentence splitting?

It would seem to me that it should remain one sentence. For example, the smoking status detector has its own SentenceAdjuster that merges some of such sentences back into one, because of this design.

Thanks, Tomasz

________________________________________ From: Finan, Sean [Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To: de...@ctakes.apache.org Subject: RE: Allergy Annotator

Hi Tom,

It is exactly because the sentence detector splits "KEY:" from "VALUE" that I didn't suggest using sentences. Instead, I would just iterate over the whole cas collection of medication events and attempt to match allergy phrases ("allergic to medication") with text the note spanning from event.begin-15 to
event.end+15 or whatever window size you prefer.

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject: Re: Allergy Annotator

Sean and Dima, these are great suggestions, thanks so far.

Sean, when looping over medication events as you say, I can see how it is possible to take the textspan.Sentence of this MedicationMention, and then do a regex check for the phrase structure as Dima said.

But instead of textspan.Sentence, you mention "see any is included in a phrase".
What cTAKES/UIMA class is related to this?

Because if I would use textspan.Sentence, it would work for "The patient is allergic to penicillin.", but cTAKES splits "ALLERGIES: PENICILLIN, WHEAT" into two sentences, so that the MedicationMentions here would not be in the same sentence as the word "ALLERGIES".

Thanks again, Tom

On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < Sean...@childrens.harvard.edu>
wrote:

Hi Dima, Tom,

I was thinking the same as Dima's first solution. Iterate through the medication events and see any is included in a phrase as mentioned in Tom's original email. Each phrase structure would have to be specified beforehand. However, assigning appropriate CUIs would require having a lookup table for each medication allergy. I think that would be the simplest solution.

Sean

-----Original Message----- From: Dligach, Dmitriy [mailto:Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To: cTAKES Developer list Subject: Re: Allergy Annotator

Hi Tom,

If the patters are pretty simple, you could just add a few rules on top of the cTAKES dictionary lookup output. Something of the kind "allergic to <medication>" or "allergies: <medication1>, <medication2>, <substance1>, ...".

If these patterns are hard to express as rules, you should consider a machine learning based sequence labeling route (e.g. something similar to the cTAKES chunker).

Dima

-- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and Harvard Medical School (617) 651-0397

On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto: deve...@gmail.com>> wrote:

Sean,

It would be a wider net, such that if an allergy is mentioned in the clinical note, this is captured in the corresponding IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation class should not be changed with a new attribute, in a separate allergy annotation).

This annotator would then have to of course run after the clinical pipeline has run and discovered all IdentifiedAnnotations.

I am familiar with writing UIMA/cTAKES annotators, but not sure how a new ML method could be integrated here for detecting allergies. Do you have any thoughts about how to approach this in general?

Thanks, Tom

On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e du>> wrote:

Hi Tom,

Are you interested in catching all allergies or just a few specific allergies for a study? If you are only concerned with a few then there is a (possibly) simple solution. If you are interested in throwing a wider net then I think that a new module would need to be created; does anybody reading this have an ML or regex style module?

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<ma...@ctakes.apache.org> Subject: Allergy Annotator

Hi,

I would like to use/extend cTAKES to detect allergies.

In the cTAKES publication (2010)

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a given medication are handled by setting the negation attribute of that medication to 'is negated'."

However, in a post here in 2014 (RE: Allergy Indication) it is said that cTAKES does not have a module for allergy discovery.

1. What is the current status of allergy detection in cTAKES?

2. I did some testing, while cTAKES discovers concepts about allegies ("wheat allergy" is found as C0949570), using "ALLERGIES: PENICILLIN, WHEAT" or "The patient is allergic to penicillin." does not give penicillin or wheat annotations allergy status.

How would I go about detecting these allergy mentions?

Thanks, Tom



RE: Allergy Annotator

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Sean,

Even with a change to your sentence detection you may need to change your negation annotator.

As a quick change, you can add an annotator to deal specifically with your situation.  It can be simpler or more elaborate, but something like this:


   static private final Pattern NEGATIVE_PATTERN
         = Pattern.compile( "(?:\\s?:\\s*)(?:NEGATIVE|(?:NO\\.?\\b)|NONE|(?:NOT (?:SEEN|PRESENT|INDICATED|FOUND|DISCOVERED)?))",
         Pattern.CASE_INSENSITIVE );

   /**
    * Finds list-style negations
    * {@inheritDoc}
    */
   @Override
   public void process( final JCas jcas ) throws AnalysisEngineProcessException {
      LOGGER.info( "Starting Processing" );
      final Collection<DiseaseDisorderMention> diseases = JCasUtil.select( jcas, DiseaseDisorderMention.class );
      if ( !diseases.isEmpty() ) {
         processType( jcas, diseases );
      }
      final Collection<SignSymptomMention> findings = JCasUtil.select( jcas, SignSymptomMention.class );
      if ( !findings.isEmpty() ) {
         processType( jcas, findings );
      }
      LOGGER.info( "Finished Processing" );
   }

   static private void processType( final JCas jcas, final Collection<? extends IdentifiedAnnotation> annotations ) {
      final String docText = jcas.getDocumentText();
      for ( IdentifiedAnnotation annotation : annotations ) {
         String window;
         final int annotationEnd = annotation.getEnd();
         final int maxEnd = Math.min( docText.length(), annotationEnd + 60 );
         final List<Sentence> covering = JCasUtil.selectCovering( jcas, Sentence.class, annotation );
         if ( covering == null || covering.isEmpty() ) {
            LOGGER.warn( "Identified Annotation spans not within a Sentence : " + annotation.getCoveredText() );
            window = docText.substring( annotationEnd, maxEnd );
         } else if ( covering.size() > 1 ) {
            LOGGER.warn( DocumentIDAnnotationUtil.getDocumentID( jcas ) );
            LOGGER.warn( "Identified Annotation spans " + covering.size() + " Sentences : " + annotation.getCoveredText() );
            final int sentencesEnd = covering.stream().mapToInt( Sentence::getEnd ).max().orElse( maxEnd );
            window = docText.substring( annotationEnd, sentencesEnd );
//            covering.stream().map( Sentence::getCoveredText ).forEach( LOGGER::warn );
         } else {
            window = docText.substring( annotationEnd, covering.get( 0 ).getEnd() );
         }
         final Matcher matcher = NEGATIVE_PATTERN.matcher( window );
         if ( matcher.find() ) {
            annotation.setPolarity( CONST.NE_POLARITY_NEGATION_PRESENT );
         }
      }
   }

-----Original Message-----
From: Mullane, Sean *HS [mailto:SPM9R@hscmail.mcc.virginia.edu] 
Sent: Wednesday, December 07, 2016 3:30 PM
To: 'Tomasz Oliwa'
Cc: 'dev@ctakes.apache.org'
Subject: RE: Allergy Annotator

I'm reviving this thread with reference to negation detection. I previously posted about this to the User list but this is probably a more appropriate venue.

The way the sentences are split on ":" makes the negation annotator miss negation in lists of this form:

Hyperlipidemia:  Yes
Hypercholesterolemia:  No
Chronic Renal Insufficiency:  N/A

I tried reversing order and removing ":"s and found that the negation for Hypercholesterolemia is detected when in this form:

Yes Hyperlipidemia
No Hypercholesterolemia
N/A Chronic Renal Insufficiency

Our notes have quite a few places with this sort of list where good negation detection is important but I haven't very good results. The sentence segmentator sees this as 12 separate sentences, but I would think proper behavior would be to consider this as 6 sentences (breaking sentences on line break but not on colons). I see previous discussion on the list about the sentence segmentator breaking on newlines but little regarding colons. I would think in most cases it would be more useful not to break on ":". Or is there an overriding reason for the current behavior?
If changing the sentence segmentator isn't an option is there a different way to configure the negation detection annotator that would avoid this issue?

Thanks,
Sean



Hi,

I am interested in the design decision of the sentence detector.

Why does it split a sentence of the form "WORD1: WORD2 WORD3." into two sentences "WORD1:" and "WORD2 WORD3."? Do other components of cTAKES require such a sentence splitting?

It would seem to me that it should remain one sentence. For example, the smoking status detector has its own SentenceAdjuster that merges some of such sentences back into one, because of this design.

Thanks, Tomasz

________________________________________ From: Finan, Sean [Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To: de...@ctakes.apache.org Subject: RE: Allergy Annotator

Hi Tom,

It is exactly because the sentence detector splits "KEY:" from "VALUE" that I didn't suggest using sentences. Instead, I would just iterate over the whole cas collection of medication events and attempt to match allergy phrases ("allergic to medication") with text the note spanning from event.begin-15 to
event.end+15 or whatever window size you prefer.

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject: Re: Allergy Annotator

Sean and Dima, these are great suggestions, thanks so far.

Sean, when looping over medication events as you say, I can see how it is possible to take the textspan.Sentence of this MedicationMention, and then do a regex check for the phrase structure as Dima said.

But instead of textspan.Sentence, you mention "see any is included in a phrase".
What cTAKES/UIMA class is related to this?

Because if I would use textspan.Sentence, it would work for "The patient is allergic to penicillin.", but cTAKES splits "ALLERGIES: PENICILLIN, WHEAT" into two sentences, so that the MedicationMentions here would not be in the same sentence as the word "ALLERGIES".

Thanks again, Tom

On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < Sean...@childrens.harvard.edu>
wrote:

Hi Dima, Tom,

I was thinking the same as Dima's first solution. Iterate through the medication events and see any is included in a phrase as mentioned in Tom's original email. Each phrase structure would have to be specified beforehand. However, assigning appropriate CUIs would require having a lookup table for each medication allergy. I think that would be the simplest solution.

Sean

-----Original Message----- From: Dligach, Dmitriy [mailto:Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To: cTAKES Developer list Subject: Re: Allergy Annotator

Hi Tom,

If the patters are pretty simple, you could just add a few rules on top of the cTAKES dictionary lookup output. Something of the kind "allergic to <medication>" or "allergies: <medication1>, <medication2>, <substance1>, ...".

If these patterns are hard to express as rules, you should consider a machine learning based sequence labeling route (e.g. something similar to the cTAKES chunker).

Dima

-- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and Harvard Medical School (617) 651-0397

On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto: deve...@gmail.com>> wrote:

Sean,

It would be a wider net, such that if an allergy is mentioned in the clinical note, this is captured in the corresponding IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation class should not be changed with a new attribute, in a separate allergy annotation).

This annotator would then have to of course run after the clinical pipeline has run and discovered all IdentifiedAnnotations.

I am familiar with writing UIMA/cTAKES annotators, but not sure how a new ML method could be integrated here for detecting allergies. Do you have any thoughts about how to approach this in general?

Thanks, Tom

On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e du>> wrote:

Hi Tom,

Are you interested in catching all allergies or just a few specific allergies for a study? If you are only concerned with a few then there is a (possibly) simple solution. If you are interested in throwing a wider net then I think that a new module would need to be created; does anybody reading this have an ML or regex style module?

Sean

-----Original Message----- From: Tom Devel [mailto:deve...@gmail.com] Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<ma...@ctakes.apache.org> Subject: Allergy Annotator

Hi,

I would like to use/extend cTAKES to detect allergies.

In the cTAKES publication (2010)

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a given medication are handled by setting the negation attribute of that medication to 'is negated'."

However, in a post here in 2014 (RE: Allergy Indication) it is said that cTAKES does not have a module for allergy discovery.

1. What is the current status of allergy detection in cTAKES?

2. I did some testing, while cTAKES discovers concepts about allegies ("wheat allergy" is found as C0949570), using "ALLERGIES: PENICILLIN, WHEAT" or "The patient is allergic to penicillin." does not give penicillin or wheat annotations allergy status.

How would I go about detecting these allergy mentions?

Thanks, Tom


RE: Allergy Annotator

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Shyam,

I just checked an example into 3.2.3 trunk.
org.apache.ctakes.examples.pipeline.ProcessLinesClinicalRunner.java

It reads one of the example files and logs some simple entity properties.  The code for the collection reader :

         // Add the Lines from File reader
         final File inputFile = FileLocator.locateFile( INPUT_FILE_PATH );
         final CollectionReader linesFromFileReader
               = CollectionReaderFactory.createReader( LinesFromFileCollectionReader.class,
               LinesFromFileCollectionReader.PARAM_INPUT_FILE_NAME, inputFile.getAbsolutePath() );

Sean

-----Original Message-----
From: Ks Sunder [mailto:shyam769@gmail.com] 
Sent: Monday, January 16, 2017 1:23 AM
To: dev@ctakes.apache.org
Subject: Re: Allergy Annotator

Thanq Sean,

can we have any LinesFromFileCollectionReader  example please  share me,



regards,
shyam k.

On Fri, Jan 13, 2017 at 8:19 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Shyam,
>
> I'm not sure what the [4] is doing in your nextLine String processing.
>
> That aside, are you seeing the pipeline being initiated multiple times?
> This could be the problem.
>
> Your file reader looks nice, but as I advised in my last email, give 
> LinesFromFileCollectionReader a try.  Instead of creating a new cas 
> object and initializing the pipeline once per line, this will allow 
> ctakes to reuse a single cas object and initialize the pipeline only once.
>
> Sean
>
> -----Original Message-----
> From: Ks Sunder [mailto:shyam769@gmail.com]
> Sent: Friday, January 13, 2017 1:11 AM
> To: dev@ctakes.apache.org
> Subject: Re: Allergy Annotator
>
> Thanq Sean,
>
>    I have done coding for this  read the csv file purpose im using 
> java, but cTakes UML Dictionary purpose I am using below fuction.
>
>
>  public  AnalysisEngineDescription getUMLPipeline() throws 
> ResourceInitializationException, URISyntaxException{
>    AggregateBuilder builder = new AggregateBuilder();
>    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
>    builder.add(SentenceDetector.createAnnotatorDescription());
>    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
>    builder.add(POSTagger.createAnnotatorDescription());
>    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
>    builder.add(LvgAnnotator.createAnnotatorDescription());
>
>      try {
>          builder.add( AnalysisEngineFactory.createEngineDescription(
> DefaultJCasTermAnnotator.class,
>               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
>               "org.apache.ctakes.typesystem.type.textspan.Sentence",
>               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
>               ExternalResourceFactory.createExternalResourceDescription(
>                     FileResourceImpl.class,
>                     FileLocator.locateFile( "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> )
>                     )
>         ) );
>      } catch ( FileNotFoundException e ) {
>         e.printStackTrace();
>         throw new ResourceInitializationException( e );
>      }
>
>    return builder.createAggregateDescription();
>  }
>
>
> and next I am calling this fuction from here......
>
>
>
>  reader = new CSVReader(new FileReader(ExelReadJava.NarrativeFile));
>  String [] nextLine;
>  int lineNumber = 0;
>
>
>  while ((nextLine = reader.readNext()) != null) {
>    lineNumber++;
>    System.out.println("Line # " + lineNumber);
>
>     //UML code start
>       try {
> if(nextLine[4].length()>1 ){
>
> final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( 
> nextLine[4] ); SimplePipeline.runPipeline(jcas, pipelineTesting.
> getUMLPipeline());
>
> for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, 
> IdentifiedAnnotation.class ) ) {
>      if(entity.getOntologyConceptArr() != null){
>
>     add.append(entity.getCoveredText()+ ",");
>      }
> }
>
>
> this function working properly , but processing time one line per 
> 40sec, how can decrease the processing time .
>
> i have 1lakh records(lines) in a csv file.
>
> please give me a solution and example......
>
>
>
>
>
> regards,
> shyam k.
>
> On Thu, Jan 12, 2017 at 8:48 PM, Finan, Sean < 
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Shyam,
> >
> > Have a look at the LinesFromFileCollectionReader class in ctakes-core.
> > It doesn't use csv files, but instead treats every newline character 
> > as a separator.
> >
> > Sean
> >
> > -----Original Message-----
> > From: Ks Sunder [mailto:shyam769@gmail.com]
> > Sent: Wednesday, January 11, 2017 1:29 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Allergy Annotator
> >
> > Hi All,
> >
> > my scenario is, read the string content from csv file, and find out 
> > medical terms from that content using cTakes UML.
> >
> > as per your suggestion i try to find CollectionReader in 
> > ctakes-core, but i didnt get clear solution, please give valuable 
> > solution, and one
> example.
> >
> >
> > regards,
> > shyam k.
> >
> > On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean < 
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Shyam,
> > >
> > > I think that the key to your first question
> > > >   how can execute the single function to run all this jobs in 
> > > > short
> > > time...
> > > Is in your code here:
> > >
> > > 1       final JCas jcas = JCasFactory.createJCas();
> > > 2       jcas.setDocumentText( nextLine[0] );
> > > 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
> > >
> > > What you probably want to do is replace lines #1 and #2 with a 
> > > CollectionReader, and then in #3 use a different SimplePipeline 
> > > call that runs the pipeline using the CollectionReader instead of 
> > > a static
> > cas.
> > >
> > > There are commonly used CollectionReaders in ctakes-core.  The 
> > > most widely applicable is probably the FileTreeReader*, which 
> > > reads a tree of ascii files.  If you have some other source of 
> > > text data then look around the code for something that might fit 
> > > and let the devlist know if you can't find anything that fits your needs.
> > >
> > > I don't understand your second question:
> > > > how can i find sentence vised Dictionary words from string, give 
> > > > me a
> > > solution for this..
> > > Can you rephrase it and post to the devlist again?
> > >
> > > * one advantage that the FileTreeReader has is that it stores 
> > > metadata on the input file tree placement, which can then be 
> > > reproduced by output file writers like the html writer.
> > >
> > > Sean
> > >
> > >
> > > -----Original Message-----
> > > From: Ks Sunder [mailto:shyam769@gmail.com]
> > > Sent: Thursday, December 22, 2016 2:33 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Allergy Annotator
> > >
> > > Hi All,
> > >
> > > I have done the below code for finding medical terms from String 
> > > information.
> > >
> > > step 1 :
> > > public static AnalysisEngineDescription getUMLPipeline() throws 
> > > ResourceInitializationException, URISyntaxException{
> > >    AggregateBuilder builder = new AggregateBuilder();
> > >    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
> > >    builder.add(SentenceDetector.createAnnotatorDescription());
> > >    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
> > >    builder.add(POSTagger.createAnnotatorDescription());
> > >    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
> > >    builder.add(LvgAnnotator.createAnnotatorDescription());
> > >
> > >      try {
> > >          builder.add( 
> > > AnalysisEngineFactory.createEngineDescription(
> > > DefaultJCasTermAnnotator.class,
> > >               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
> > >               "org.apache.ctakes.typesystem.type.textspan.Sentence",
> > >               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
> > >               
> > > ExternalResourceFactory.createExternalResourceDescript
> ion(
> > >                     FileResourceImpl.class,
> > >                     FileLocator.locateFile(
> > "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> > > ) )
> > >         ) );
> > >      } catch ( FileNotFoundException e ) {
> > >         e.printStackTrace();
> > >         throw new ResourceInitializationException( e );
> > >      }
> > >
> > >    return builder.createAggregateDescription();
> > >  }
> > > step 2:
> > >
> > > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( 
> > > nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
> > >
> > > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, 
> > > IdentifiedAnnotation.class ) ) {
> > >
> > >          if(entity.getOntologyConceptArr() != null){
> > >
> > >         add.append(entity.getCoveredText()+ ",");
> > >
> > >          }
> > > }
> > >
> > >
> > >
> > >
> > >
> > > its working Fine..
> > >
> > > But i have two quires..
> > >
> > > 1. step1 , i am using Annotator step by step ... that time its 
> > > taking more time load the all fuctions
> > >    how can execute the single function to run all this jobs in 
> > > short time...
> > >
> > > 2. how can i find sentence vised Dictionary words from string, 
> > > give me a solution for this..
> > >
> > >
> > > ...please give me a solutions for this issues....
> > >
> > >
> > >
> > > regards,
> > > shyam k.
> > >
> > > On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS < 
> > > SPM9R@hscmail.mcc.virginia.edu> wrote:
> > >
> > > > I'm reviving this thread with reference to negation detection. I 
> > > > previously posted about this to the User list but this is 
> > > > probably a more appropriate venue.
> > > >
> > > > The way the sentences are split on ":" makes the negation 
> > > > annotator miss negation in lists of this form:
> > > >
> > > > Hyperlipidemia:  Yes
> > > > Hypercholesterolemia:  No
> > > > Chronic Renal Insufficiency:  N/A
> > > >
> > > > I tried reversing order and removing ":"s and found that the 
> > > > negation for Hypercholesterolemia is detected when in this form:
> > > >
> > > > Yes Hyperlipidemia
> > > > No Hypercholesterolemia
> > > > N/A Chronic Renal Insufficiency
> > > >
> > > > Our notes have quite a few places with this sort of list where 
> > > > good negation detection is important but I haven't very good 
> > > > results. The sentence segmentator sees this as 12 separate 
> > > > sentences, but I would think proper behavior would be to 
> > > > consider this as 6 sentences (breaking sentences on line break 
> > > > but not on colons). I see previous discussion on the list about 
> > > > the sentence segmentator breaking on newlines but little 
> > > > regarding colons. I would think in most cases it would be more 
> > > > useful not to break on ":". Or is there an overriding
> > > reason for the current behavior?
> > > > If changing the sentence segmentator isn't an option is there a 
> > > > different way to configure the negation detection annotator that 
> > > > would avoid this issue?
> > > >
> > > > Thanks,
> > > > Sean
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I am interested in the design decision of the sentence detector.
> > > >
> > > > Why does it split a sentence of the form "WORD1: WORD2 WORD3."
> > > > into two sentences "WORD1:" and "WORD2 WORD3."? Do other 
> > > > components of cTAKES require such a sentence splitting?
> > > >
> > > > It would seem to me that it should remain one sentence. For 
> > > > example, the smoking status detector has its own 
> > > > SentenceAdjuster that merges some of such sentences back into 
> > > > one, because of this
> design.
> > > >
> > > > Thanks, Tomasz
> > > >
> > > > ________________________________________ From: Finan, Sean [ 
> > > > Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 
> > > > PM
> To:
> > > > de...@ctakes.apache.org Subject: RE: Allergy Annotator
> > > >
> > > > Hi Tom,
> > > >
> > > > It is exactly because the sentence detector splits "KEY:" from
> "VALUE"
> > > > that I
> > > > didn't suggest using sentences. Instead, I would just iterate 
> > > > over the whole cas collection of medication events and attempt 
> > > > to match allergy phrases ("allergic to medication") with text 
> > > > the note spanning from
> > > > event.begin-15 to
> > > > event.end+15 or whatever window size you prefer.
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Tom Devel 
> > > > [mailto:deve...@gmail.com]
> > > > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org
> > Subject:
> > > > Re: Allergy Annotator
> > > >
> > > > Sean and Dima, these are great suggestions, thanks so far.
> > > >
> > > > Sean, when looping over medication events as you say, I can see 
> > > > how it is possible to take the textspan.Sentence of this 
> > > > MedicationMention, and then do a regex check for the phrase 
> > > > structure
> > as Dima said.
> > > >
> > > > But instead of textspan.Sentence, you mention "see any is 
> > > > included in a phrase".
> > > > What cTAKES/UIMA class is related to this?
> > > >
> > > > Because if I would use textspan.Sentence, it would work for "The 
> > > > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> > > PENICILLIN, WHEAT"
> > > > into two sentences, so that the MedicationMentions here would 
> > > > not be in the same sentence as the word "ALLERGIES".
> > > >
> > > > Thanks again, Tom
> > > >
> > > > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < 
> > > > Sean...@childrens.harvard.edu>
> > > > wrote:
> > > >
> > > > Hi Dima, Tom,
> > > >
> > > > I was thinking the same as Dima's first solution. Iterate 
> > > > through the medication events and see any is included in a 
> > > > phrase as mentioned in Tom's original email. Each phrase 
> > > > structure would have to be specified beforehand. However, 
> > > > assigning appropriate CUIs would require having a lookup table 
> > > > for each medication allergy. I think that would be the simplest solution.
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > > > Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 
> > > > PM
> To:
> > > > cTAKES Developer list Subject: Re: Allergy Annotator
> > > >
> > > > Hi Tom,
> > > >
> > > > If the patters are pretty simple, you could just add a few rules 
> > > > on top of the cTAKES dictionary lookup output. Something of the 
> > > > kind "allergic to <medication>" or "allergies: <medication1>, 
> > > > <medication2>, <substance1>, ...".
> > > >
> > > > If these patterns are hard to express as rules, you should 
> > > > consider a machine learning based sequence labeling route (e.g.
> > > > something similar to the cTAKES chunker).
> > > >
> > > > Dima
> > > >
> > > > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and 
> > > > Harvard Medical School (617) 651-0397
> > > >
> > > > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> > > > deve...@gmail.com>> wrote:
> > > >
> > > > Sean,
> > > >
> > > > It would be a wider net, such that if an allergy is mentioned in 
> > > > the clinical note, this is captured in the corresponding 
> > > > IdentifiedAnnotation (or alternatively, if the 
> > > > IdentifiedAnnotation class should not be changed with a new 
> > > > attribute, in a separate allergy annotation).
> > > >
> > > > This annotator would then have to of course run after the 
> > > > clinical pipeline has run and discovered all IdentifiedAnnotations.
> > > >
> > > > I am familiar with writing UIMA/cTAKES annotators, but not sure 
> > > > how a new ML method could be integrated here for detecting 
> > > > allergies. Do you have any thoughts about how to approach this 
> > > > in
> general?
> > > >
> > > > Thanks, Tom
> > > >
> > > > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < 
> > > > Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.
> > > > e
> > > > du>>
> > > > wrote:
> > > >
> > > > Hi Tom,
> > > >
> > > > Are you interested in catching all allergies or just a few 
> > > > specific allergies for a study? If you are only concerned with a 
> > > > few then there is a
> > > > (possibly) simple solution. If you are interested in throwing a 
> > > > wider net then I think that a new module would need to be 
> > > > created; does anybody reading this have an ML or regex style module?
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Tom Devel 
> > > > [mailto:deve...@gmail.com]
> > > > Sent: Friday, July 10, 2015 12:42 PM To: 
> > > > de...@ctakes.apache.org<
> > mailto:
> > > > de...@ctakes.apache.org> Subject: Allergy Annotator
> > > >
> > > > Hi,
> > > >
> > > > I would like to use/extend cTAKES to detect allergies.
> > > >
> > > > In the cTAKES publication (2010)
> > > >
> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm
> > > > .n
> > > > ih
> > > > .g
> > > > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwE
> > > > W1
> > > > 4J
> > > > ZM
> > > > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZAp
> > > > Jm
> > > > GK
> > > > jz
> > > > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5I
> > > > Ye 7t 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to 
> > > > a given medication are handled by setting the negation attribute 
> > > > of that medication to 'is negated'."
> > > >
> > > > However, in a post here in 2014 (RE: Allergy Indication) it is 
> > > > said that cTAKES does not have a module for allergy discovery.
> > > >
> > > > 1. What is the current status of allergy detection in cTAKES?
> > > >
> > > > 2. I did some testing, while cTAKES discovers concepts about 
> > > > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES:
> > > > PENICILLIN, WHEAT" or "The patient is allergic to penicillin."
> > > > does not give penicillin or wheat annotations allergy status.
> > > >
> > > > How would I go about detecting these allergy mentions?
> > > >
> > > > Thanks, Tom
> > > >
> > > >
> > >
> >
>

Re: Allergy Annotator

Posted by Ks Sunder <sh...@gmail.com>.
Thanq Sean,

can we have any LinesFromFileCollectionReader  example please  share me,



regards,
shyam k.

On Fri, Jan 13, 2017 at 8:19 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Shyam,
>
> I'm not sure what the [4] is doing in your nextLine String processing.
>
> That aside, are you seeing the pipeline being initiated multiple times?
> This could be the problem.
>
> Your file reader looks nice, but as I advised in my last email, give
> LinesFromFileCollectionReader a try.  Instead of creating a new cas object
> and initializing the pipeline once per line, this will allow ctakes to
> reuse a single cas object and initialize the pipeline only once.
>
> Sean
>
> -----Original Message-----
> From: Ks Sunder [mailto:shyam769@gmail.com]
> Sent: Friday, January 13, 2017 1:11 AM
> To: dev@ctakes.apache.org
> Subject: Re: Allergy Annotator
>
> Thanq Sean,
>
>    I have done coding for this  read the csv file purpose im using java,
> but cTakes UML Dictionary purpose I am using below fuction.
>
>
>  public  AnalysisEngineDescription getUMLPipeline() throws
> ResourceInitializationException, URISyntaxException{
>    AggregateBuilder builder = new AggregateBuilder();
>    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
>    builder.add(SentenceDetector.createAnnotatorDescription());
>    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
>    builder.add(POSTagger.createAnnotatorDescription());
>    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
>    builder.add(LvgAnnotator.createAnnotatorDescription());
>
>      try {
>          builder.add( AnalysisEngineFactory.createEngineDescription(
> DefaultJCasTermAnnotator.class,
>               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
>               "org.apache.ctakes.typesystem.type.textspan.Sentence",
>               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
>               ExternalResourceFactory.createExternalResourceDescription(
>                     FileResourceImpl.class,
>                     FileLocator.locateFile( "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> )
>                     )
>         ) );
>      } catch ( FileNotFoundException e ) {
>         e.printStackTrace();
>         throw new ResourceInitializationException( e );
>      }
>
>    return builder.createAggregateDescription();
>  }
>
>
> and next I am calling this fuction from here......
>
>
>
>  reader = new CSVReader(new FileReader(ExelReadJava.NarrativeFile));
>  String [] nextLine;
>  int lineNumber = 0;
>
>
>  while ((nextLine = reader.readNext()) != null) {
>    lineNumber++;
>    System.out.println("Line # " + lineNumber);
>
>     //UML code start
>       try {
> if(nextLine[4].length()>1 ){
>
> final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText(
> nextLine[4] ); SimplePipeline.runPipeline(jcas, pipelineTesting.
> getUMLPipeline());
>
> for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
> IdentifiedAnnotation.class ) ) {
>      if(entity.getOntologyConceptArr() != null){
>
>     add.append(entity.getCoveredText()+ ",");
>      }
> }
>
>
> this function working properly , but processing time one line per 40sec,
> how can decrease the processing time .
>
> i have 1lakh records(lines) in a csv file.
>
> please give me a solution and example......
>
>
>
>
>
> regards,
> shyam k.
>
> On Thu, Jan 12, 2017 at 8:48 PM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Shyam,
> >
> > Have a look at the LinesFromFileCollectionReader class in ctakes-core.
> > It doesn't use csv files, but instead treats every newline character
> > as a separator.
> >
> > Sean
> >
> > -----Original Message-----
> > From: Ks Sunder [mailto:shyam769@gmail.com]
> > Sent: Wednesday, January 11, 2017 1:29 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Allergy Annotator
> >
> > Hi All,
> >
> > my scenario is, read the string content from csv file, and find out
> > medical terms from that content using cTakes UML.
> >
> > as per your suggestion i try to find CollectionReader in ctakes-core,
> > but i didnt get clear solution, please give valuable solution, and one
> example.
> >
> >
> > regards,
> > shyam k.
> >
> > On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Shyam,
> > >
> > > I think that the key to your first question
> > > >   how can execute the single function to run all this jobs in
> > > > short
> > > time...
> > > Is in your code here:
> > >
> > > 1       final JCas jcas = JCasFactory.createJCas();
> > > 2       jcas.setDocumentText( nextLine[0] );
> > > 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
> > >
> > > What you probably want to do is replace lines #1 and #2 with a
> > > CollectionReader, and then in #3 use a different SimplePipeline call
> > > that runs the pipeline using the CollectionReader instead of a
> > > static
> > cas.
> > >
> > > There are commonly used CollectionReaders in ctakes-core.  The most
> > > widely applicable is probably the FileTreeReader*, which reads a
> > > tree of ascii files.  If you have some other source of text data
> > > then look around the code for something that might fit and let the
> > > devlist know if you can't find anything that fits your needs.
> > >
> > > I don't understand your second question:
> > > > how can i find sentence vised Dictionary words from string, give
> > > > me a
> > > solution for this..
> > > Can you rephrase it and post to the devlist again?
> > >
> > > * one advantage that the FileTreeReader has is that it stores
> > > metadata on the input file tree placement, which can then be
> > > reproduced by output file writers like the html writer.
> > >
> > > Sean
> > >
> > >
> > > -----Original Message-----
> > > From: Ks Sunder [mailto:shyam769@gmail.com]
> > > Sent: Thursday, December 22, 2016 2:33 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: Allergy Annotator
> > >
> > > Hi All,
> > >
> > > I have done the below code for finding medical terms from String
> > > information.
> > >
> > > step 1 :
> > > public static AnalysisEngineDescription getUMLPipeline() throws
> > > ResourceInitializationException, URISyntaxException{
> > >    AggregateBuilder builder = new AggregateBuilder();
> > >    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
> > >    builder.add(SentenceDetector.createAnnotatorDescription());
> > >    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
> > >    builder.add(POSTagger.createAnnotatorDescription());
> > >    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
> > >    builder.add(LvgAnnotator.createAnnotatorDescription());
> > >
> > >      try {
> > >          builder.add( AnalysisEngineFactory.createEngineDescription(
> > > DefaultJCasTermAnnotator.class,
> > >               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
> > >               "org.apache.ctakes.typesystem.type.textspan.Sentence",
> > >               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
> > >               ExternalResourceFactory.createExternalResourceDescript
> ion(
> > >                     FileResourceImpl.class,
> > >                     FileLocator.locateFile(
> > "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> > > ) )
> > >         ) );
> > >      } catch ( FileNotFoundException e ) {
> > >         e.printStackTrace();
> > >         throw new ResourceInitializationException( e );
> > >      }
> > >
> > >    return builder.createAggregateDescription();
> > >  }
> > > step 2:
> > >
> > > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText(
> > > nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
> > >
> > > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
> > > IdentifiedAnnotation.class ) ) {
> > >
> > >          if(entity.getOntologyConceptArr() != null){
> > >
> > >         add.append(entity.getCoveredText()+ ",");
> > >
> > >          }
> > > }
> > >
> > >
> > >
> > >
> > >
> > > its working Fine..
> > >
> > > But i have two quires..
> > >
> > > 1. step1 , i am using Annotator step by step ... that time its
> > > taking more time load the all fuctions
> > >    how can execute the single function to run all this jobs in short
> > > time...
> > >
> > > 2. how can i find sentence vised Dictionary words from string, give
> > > me a solution for this..
> > >
> > >
> > > ...please give me a solutions for this issues....
> > >
> > >
> > >
> > > regards,
> > > shyam k.
> > >
> > > On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS <
> > > SPM9R@hscmail.mcc.virginia.edu> wrote:
> > >
> > > > I'm reviving this thread with reference to negation detection. I
> > > > previously posted about this to the User list but this is probably
> > > > a more appropriate venue.
> > > >
> > > > The way the sentences are split on ":" makes the negation
> > > > annotator miss negation in lists of this form:
> > > >
> > > > Hyperlipidemia:  Yes
> > > > Hypercholesterolemia:  No
> > > > Chronic Renal Insufficiency:  N/A
> > > >
> > > > I tried reversing order and removing ":"s and found that the
> > > > negation for Hypercholesterolemia is detected when in this form:
> > > >
> > > > Yes Hyperlipidemia
> > > > No Hypercholesterolemia
> > > > N/A Chronic Renal Insufficiency
> > > >
> > > > Our notes have quite a few places with this sort of list where
> > > > good negation detection is important but I haven't very good
> > > > results. The sentence segmentator sees this as 12 separate
> > > > sentences, but I would think proper behavior would be to consider
> > > > this as 6 sentences (breaking sentences on line break but not on
> > > > colons). I see previous discussion on the list about the sentence
> > > > segmentator breaking on newlines but little regarding colons. I
> > > > would think in most cases it would be more useful not to break on
> > > > ":". Or is there an overriding
> > > reason for the current behavior?
> > > > If changing the sentence segmentator isn't an option is there a
> > > > different way to configure the negation detection annotator that
> > > > would avoid this issue?
> > > >
> > > > Thanks,
> > > > Sean
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > > I am interested in the design decision of the sentence detector.
> > > >
> > > > Why does it split a sentence of the form "WORD1: WORD2 WORD3."
> > > > into two sentences "WORD1:" and "WORD2 WORD3."? Do other
> > > > components of cTAKES require such a sentence splitting?
> > > >
> > > > It would seem to me that it should remain one sentence. For
> > > > example, the smoking status detector has its own SentenceAdjuster
> > > > that merges some of such sentences back into one, because of this
> design.
> > > >
> > > > Thanks, Tomasz
> > > >
> > > > ________________________________________ From: Finan, Sean [
> > > > Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM
> To:
> > > > de...@ctakes.apache.org Subject: RE: Allergy Annotator
> > > >
> > > > Hi Tom,
> > > >
> > > > It is exactly because the sentence detector splits "KEY:" from
> "VALUE"
> > > > that I
> > > > didn't suggest using sentences. Instead, I would just iterate over
> > > > the whole cas collection of medication events and attempt to match
> > > > allergy phrases ("allergic to medication") with text the note
> > > > spanning from
> > > > event.begin-15 to
> > > > event.end+15 or whatever window size you prefer.
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Tom Devel
> > > > [mailto:deve...@gmail.com]
> > > > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org
> > Subject:
> > > > Re: Allergy Annotator
> > > >
> > > > Sean and Dima, these are great suggestions, thanks so far.
> > > >
> > > > Sean, when looping over medication events as you say, I can see
> > > > how it is possible to take the textspan.Sentence of this
> > > > MedicationMention, and then do a regex check for the phrase
> > > > structure
> > as Dima said.
> > > >
> > > > But instead of textspan.Sentence, you mention "see any is included
> > > > in a phrase".
> > > > What cTAKES/UIMA class is related to this?
> > > >
> > > > Because if I would use textspan.Sentence, it would work for "The
> > > > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> > > PENICILLIN, WHEAT"
> > > > into two sentences, so that the MedicationMentions here would not
> > > > be in the same sentence as the word "ALLERGIES".
> > > >
> > > > Thanks again, Tom
> > > >
> > > > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean <
> > > > Sean...@childrens.harvard.edu>
> > > > wrote:
> > > >
> > > > Hi Dima, Tom,
> > > >
> > > > I was thinking the same as Dima's first solution. Iterate through
> > > > the medication events and see any is included in a phrase as
> > > > mentioned in Tom's original email. Each phrase structure would
> > > > have to be specified beforehand. However, assigning appropriate
> > > > CUIs would require having a lookup table for each medication
> > > > allergy. I think that would be the simplest solution.
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > > > Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM
> To:
> > > > cTAKES Developer list Subject: Re: Allergy Annotator
> > > >
> > > > Hi Tom,
> > > >
> > > > If the patters are pretty simple, you could just add a few rules
> > > > on top of the cTAKES dictionary lookup output. Something of the
> > > > kind "allergic to <medication>" or "allergies: <medication1>,
> > > > <medication2>, <substance1>, ...".
> > > >
> > > > If these patterns are hard to express as rules, you should
> > > > consider a machine learning based sequence labeling route (e.g.
> > > > something similar to the cTAKES chunker).
> > > >
> > > > Dima
> > > >
> > > > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and
> > > > Harvard Medical School (617) 651-0397
> > > >
> > > > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> > > > deve...@gmail.com>> wrote:
> > > >
> > > > Sean,
> > > >
> > > > It would be a wider net, such that if an allergy is mentioned in
> > > > the clinical note, this is captured in the corresponding
> > > > IdentifiedAnnotation (or alternatively, if the
> > > > IdentifiedAnnotation class should not be changed with a new
> > > > attribute, in a separate allergy annotation).
> > > >
> > > > This annotator would then have to of course run after the clinical
> > > > pipeline has run and discovered all IdentifiedAnnotations.
> > > >
> > > > I am familiar with writing UIMA/cTAKES annotators, but not sure
> > > > how a new ML method could be integrated here for detecting
> > > > allergies. Do you have any thoughts about how to approach this in
> general?
> > > >
> > > > Thanks, Tom
> > > >
> > > > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean <
> > > > Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.
> > > > e
> > > > du>>
> > > > wrote:
> > > >
> > > > Hi Tom,
> > > >
> > > > Are you interested in catching all allergies or just a few
> > > > specific allergies for a study? If you are only concerned with a
> > > > few then there is a
> > > > (possibly) simple solution. If you are interested in throwing a
> > > > wider net then I think that a new module would need to be created;
> > > > does anybody reading this have an ML or regex style module?
> > > >
> > > > Sean
> > > >
> > > > -----Original Message----- From: Tom Devel
> > > > [mailto:deve...@gmail.com]
> > > > Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<
> > mailto:
> > > > de...@ctakes.apache.org> Subject: Allergy Annotator
> > > >
> > > > Hi,
> > > >
> > > > I would like to use/extend cTAKES to detect allergies.
> > > >
> > > > In the cTAKES publication (2010)
> > > >
> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.n
> > > > ih
> > > > .g
> > > > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW1
> > > > 4J
> > > > ZM
> > > > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJm
> > > > GK
> > > > jz
> > > > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe
> > > > 7t 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a
> > > > given medication are handled by setting the negation attribute of
> > > > that medication to 'is negated'."
> > > >
> > > > However, in a post here in 2014 (RE: Allergy Indication) it is
> > > > said that cTAKES does not have a module for allergy discovery.
> > > >
> > > > 1. What is the current status of allergy detection in cTAKES?
> > > >
> > > > 2. I did some testing, while cTAKES discovers concepts about
> > > > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES:
> > > > PENICILLIN, WHEAT" or "The patient is allergic to penicillin."
> > > > does not give penicillin or wheat annotations allergy status.
> > > >
> > > > How would I go about detecting these allergy mentions?
> > > >
> > > > Thanks, Tom
> > > >
> > > >
> > >
> >
>

RE: Allergy Annotator

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Shyam,

I'm not sure what the [4] is doing in your nextLine String processing.

That aside, are you seeing the pipeline being initiated multiple times?  This could be the problem.

Your file reader looks nice, but as I advised in my last email, give LinesFromFileCollectionReader a try.  Instead of creating a new cas object and initializing the pipeline once per line, this will allow ctakes to reuse a single cas object and initialize the pipeline only once.

Sean

-----Original Message-----
From: Ks Sunder [mailto:shyam769@gmail.com] 
Sent: Friday, January 13, 2017 1:11 AM
To: dev@ctakes.apache.org
Subject: Re: Allergy Annotator

Thanq Sean,

   I have done coding for this  read the csv file purpose im using java, but cTakes UML Dictionary purpose I am using below fuction.


 public  AnalysisEngineDescription getUMLPipeline() throws ResourceInitializationException, URISyntaxException{
   AggregateBuilder builder = new AggregateBuilder();
   builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
   builder.add(SentenceDetector.createAnnotatorDescription());
   builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
   builder.add(POSTagger.createAnnotatorDescription());
   builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
   builder.add(LvgAnnotator.createAnnotatorDescription());

     try {
         builder.add( AnalysisEngineFactory.createEngineDescription(
DefaultJCasTermAnnotator.class,
              AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
              "org.apache.ctakes.typesystem.type.textspan.Sentence",
              JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
              ExternalResourceFactory.createExternalResourceDescription(
                    FileResourceImpl.class,
                    FileLocator.locateFile( "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml" )
                    )
        ) );
     } catch ( FileNotFoundException e ) {
        e.printStackTrace();
        throw new ResourceInitializationException( e );
     }

   return builder.createAggregateDescription();
 }


and next I am calling this fuction from here......



 reader = new CSVReader(new FileReader(ExelReadJava.NarrativeFile));
 String [] nextLine;
 int lineNumber = 0;


 while ((nextLine = reader.readNext()) != null) {
   lineNumber++;
   System.out.println("Line # " + lineNumber);

    //UML code start
      try {
if(nextLine[4].length()>1 ){

final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( nextLine[4] ); SimplePipeline.runPipeline(jcas, pipelineTesting.getUMLPipeline());

for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, IdentifiedAnnotation.class ) ) {
     if(entity.getOntologyConceptArr() != null){

    add.append(entity.getCoveredText()+ ",");
     }
}


this function working properly , but processing time one line per 40sec, how can decrease the processing time .

i have 1lakh records(lines) in a csv file.

please give me a solution and example......





regards,
shyam k.

On Thu, Jan 12, 2017 at 8:48 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Shyam,
>
> Have a look at the LinesFromFileCollectionReader class in ctakes-core.  
> It doesn't use csv files, but instead treats every newline character 
> as a separator.
>
> Sean
>
> -----Original Message-----
> From: Ks Sunder [mailto:shyam769@gmail.com]
> Sent: Wednesday, January 11, 2017 1:29 AM
> To: dev@ctakes.apache.org
> Subject: Re: Allergy Annotator
>
> Hi All,
>
> my scenario is, read the string content from csv file, and find out 
> medical terms from that content using cTakes UML.
>
> as per your suggestion i try to find CollectionReader in ctakes-core, 
> but i didnt get clear solution, please give valuable solution, and one example.
>
>
> regards,
> shyam k.
>
> On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean < 
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Shyam,
> >
> > I think that the key to your first question
> > >   how can execute the single function to run all this jobs in 
> > > short
> > time...
> > Is in your code here:
> >
> > 1       final JCas jcas = JCasFactory.createJCas();
> > 2       jcas.setDocumentText( nextLine[0] );
> > 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
> >
> > What you probably want to do is replace lines #1 and #2 with a 
> > CollectionReader, and then in #3 use a different SimplePipeline call 
> > that runs the pipeline using the CollectionReader instead of a 
> > static
> cas.
> >
> > There are commonly used CollectionReaders in ctakes-core.  The most 
> > widely applicable is probably the FileTreeReader*, which reads a 
> > tree of ascii files.  If you have some other source of text data 
> > then look around the code for something that might fit and let the 
> > devlist know if you can't find anything that fits your needs.
> >
> > I don't understand your second question:
> > > how can i find sentence vised Dictionary words from string, give 
> > > me a
> > solution for this..
> > Can you rephrase it and post to the devlist again?
> >
> > * one advantage that the FileTreeReader has is that it stores 
> > metadata on the input file tree placement, which can then be 
> > reproduced by output file writers like the html writer.
> >
> > Sean
> >
> >
> > -----Original Message-----
> > From: Ks Sunder [mailto:shyam769@gmail.com]
> > Sent: Thursday, December 22, 2016 2:33 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Allergy Annotator
> >
> > Hi All,
> >
> > I have done the below code for finding medical terms from String 
> > information.
> >
> > step 1 :
> > public static AnalysisEngineDescription getUMLPipeline() throws 
> > ResourceInitializationException, URISyntaxException{
> >    AggregateBuilder builder = new AggregateBuilder();
> >    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
> >    builder.add(SentenceDetector.createAnnotatorDescription());
> >    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
> >    builder.add(POSTagger.createAnnotatorDescription());
> >    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
> >    builder.add(LvgAnnotator.createAnnotatorDescription());
> >
> >      try {
> >          builder.add( AnalysisEngineFactory.createEngineDescription(
> > DefaultJCasTermAnnotator.class,
> >               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
> >               "org.apache.ctakes.typesystem.type.textspan.Sentence",
> >               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
> >               ExternalResourceFactory.createExternalResourceDescription(
> >                     FileResourceImpl.class,
> >                     FileLocator.locateFile(
> "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> > ) )
> >         ) );
> >      } catch ( FileNotFoundException e ) {
> >         e.printStackTrace();
> >         throw new ResourceInitializationException( e );
> >      }
> >
> >    return builder.createAggregateDescription();
> >  }
> > step 2:
> >
> > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( 
> > nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
> >
> > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, 
> > IdentifiedAnnotation.class ) ) {
> >
> >          if(entity.getOntologyConceptArr() != null){
> >
> >         add.append(entity.getCoveredText()+ ",");
> >
> >          }
> > }
> >
> >
> >
> >
> >
> > its working Fine..
> >
> > But i have two quires..
> >
> > 1. step1 , i am using Annotator step by step ... that time its 
> > taking more time load the all fuctions
> >    how can execute the single function to run all this jobs in short 
> > time...
> >
> > 2. how can i find sentence vised Dictionary words from string, give 
> > me a solution for this..
> >
> >
> > ...please give me a solutions for this issues....
> >
> >
> >
> > regards,
> > shyam k.
> >
> > On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS < 
> > SPM9R@hscmail.mcc.virginia.edu> wrote:
> >
> > > I'm reviving this thread with reference to negation detection. I 
> > > previously posted about this to the User list but this is probably 
> > > a more appropriate venue.
> > >
> > > The way the sentences are split on ":" makes the negation 
> > > annotator miss negation in lists of this form:
> > >
> > > Hyperlipidemia:  Yes
> > > Hypercholesterolemia:  No
> > > Chronic Renal Insufficiency:  N/A
> > >
> > > I tried reversing order and removing ":"s and found that the 
> > > negation for Hypercholesterolemia is detected when in this form:
> > >
> > > Yes Hyperlipidemia
> > > No Hypercholesterolemia
> > > N/A Chronic Renal Insufficiency
> > >
> > > Our notes have quite a few places with this sort of list where 
> > > good negation detection is important but I haven't very good 
> > > results. The sentence segmentator sees this as 12 separate 
> > > sentences, but I would think proper behavior would be to consider 
> > > this as 6 sentences (breaking sentences on line break but not on 
> > > colons). I see previous discussion on the list about the sentence 
> > > segmentator breaking on newlines but little regarding colons. I 
> > > would think in most cases it would be more useful not to break on 
> > > ":". Or is there an overriding
> > reason for the current behavior?
> > > If changing the sentence segmentator isn't an option is there a 
> > > different way to configure the negation detection annotator that 
> > > would avoid this issue?
> > >
> > > Thanks,
> > > Sean
> > >
> > >
> > >
> > > Hi,
> > >
> > > I am interested in the design decision of the sentence detector.
> > >
> > > Why does it split a sentence of the form "WORD1: WORD2 WORD3." 
> > > into two sentences "WORD1:" and "WORD2 WORD3."? Do other 
> > > components of cTAKES require such a sentence splitting?
> > >
> > > It would seem to me that it should remain one sentence. For 
> > > example, the smoking status detector has its own SentenceAdjuster 
> > > that merges some of such sentences back into one, because of this design.
> > >
> > > Thanks, Tomasz
> > >
> > > ________________________________________ From: Finan, Sean [ 
> > > Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To:
> > > de...@ctakes.apache.org Subject: RE: Allergy Annotator
> > >
> > > Hi Tom,
> > >
> > > It is exactly because the sentence detector splits "KEY:" from "VALUE"
> > > that I
> > > didn't suggest using sentences. Instead, I would just iterate over 
> > > the whole cas collection of medication events and attempt to match 
> > > allergy phrases ("allergic to medication") with text the note 
> > > spanning from
> > > event.begin-15 to
> > > event.end+15 or whatever window size you prefer.
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Tom Devel 
> > > [mailto:deve...@gmail.com]
> > > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org
> Subject:
> > > Re: Allergy Annotator
> > >
> > > Sean and Dima, these are great suggestions, thanks so far.
> > >
> > > Sean, when looping over medication events as you say, I can see 
> > > how it is possible to take the textspan.Sentence of this 
> > > MedicationMention, and then do a regex check for the phrase 
> > > structure
> as Dima said.
> > >
> > > But instead of textspan.Sentence, you mention "see any is included 
> > > in a phrase".
> > > What cTAKES/UIMA class is related to this?
> > >
> > > Because if I would use textspan.Sentence, it would work for "The 
> > > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> > PENICILLIN, WHEAT"
> > > into two sentences, so that the MedicationMentions here would not 
> > > be in the same sentence as the word "ALLERGIES".
> > >
> > > Thanks again, Tom
> > >
> > > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < 
> > > Sean...@childrens.harvard.edu>
> > > wrote:
> > >
> > > Hi Dima, Tom,
> > >
> > > I was thinking the same as Dima's first solution. Iterate through 
> > > the medication events and see any is included in a phrase as 
> > > mentioned in Tom's original email. Each phrase structure would 
> > > have to be specified beforehand. However, assigning appropriate 
> > > CUIs would require having a lookup table for each medication 
> > > allergy. I think that would be the simplest solution.
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > > Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To:
> > > cTAKES Developer list Subject: Re: Allergy Annotator
> > >
> > > Hi Tom,
> > >
> > > If the patters are pretty simple, you could just add a few rules 
> > > on top of the cTAKES dictionary lookup output. Something of the 
> > > kind "allergic to <medication>" or "allergies: <medication1>, 
> > > <medication2>, <substance1>, ...".
> > >
> > > If these patterns are hard to express as rules, you should 
> > > consider a machine learning based sequence labeling route (e.g. 
> > > something similar to the cTAKES chunker).
> > >
> > > Dima
> > >
> > > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and 
> > > Harvard Medical School (617) 651-0397
> > >
> > > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> > > deve...@gmail.com>> wrote:
> > >
> > > Sean,
> > >
> > > It would be a wider net, such that if an allergy is mentioned in 
> > > the clinical note, this is captured in the corresponding 
> > > IdentifiedAnnotation (or alternatively, if the 
> > > IdentifiedAnnotation class should not be changed with a new 
> > > attribute, in a separate allergy annotation).
> > >
> > > This annotator would then have to of course run after the clinical 
> > > pipeline has run and discovered all IdentifiedAnnotations.
> > >
> > > I am familiar with writing UIMA/cTAKES annotators, but not sure 
> > > how a new ML method could be integrated here for detecting 
> > > allergies. Do you have any thoughts about how to approach this in general?
> > >
> > > Thanks, Tom
> > >
> > > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < 
> > > Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.
> > > e
> > > du>>
> > > wrote:
> > >
> > > Hi Tom,
> > >
> > > Are you interested in catching all allergies or just a few 
> > > specific allergies for a study? If you are only concerned with a 
> > > few then there is a
> > > (possibly) simple solution. If you are interested in throwing a 
> > > wider net then I think that a new module would need to be created; 
> > > does anybody reading this have an ML or regex style module?
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Tom Devel 
> > > [mailto:deve...@gmail.com]
> > > Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<
> mailto:
> > > de...@ctakes.apache.org> Subject: Allergy Annotator
> > >
> > > Hi,
> > >
> > > I would like to use/extend cTAKES to detect allergies.
> > >
> > > In the cTAKES publication (2010)
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.n
> > > ih
> > > .g
> > > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW1
> > > 4J
> > > ZM
> > > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJm
> > > GK
> > > jz
> > > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe
> > > 7t 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a 
> > > given medication are handled by setting the negation attribute of 
> > > that medication to 'is negated'."
> > >
> > > However, in a post here in 2014 (RE: Allergy Indication) it is 
> > > said that cTAKES does not have a module for allergy discovery.
> > >
> > > 1. What is the current status of allergy detection in cTAKES?
> > >
> > > 2. I did some testing, while cTAKES discovers concepts about 
> > > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES:
> > > PENICILLIN, WHEAT" or "The patient is allergic to penicillin." 
> > > does not give penicillin or wheat annotations allergy status.
> > >
> > > How would I go about detecting these allergy mentions?
> > >
> > > Thanks, Tom
> > >
> > >
> >
>

Re: Allergy Annotator

Posted by Ks Sunder <sh...@gmail.com>.
Thanq Sean,

   I have done coding for this  read the csv file purpose im using java,
but cTakes UML Dictionary purpose I am using below fuction.


 public  AnalysisEngineDescription getUMLPipeline() throws
ResourceInitializationException, URISyntaxException{
   AggregateBuilder builder = new AggregateBuilder();
   builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
   builder.add(SentenceDetector.createAnnotatorDescription());
   builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
   builder.add(POSTagger.createAnnotatorDescription());
   builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
   builder.add(LvgAnnotator.createAnnotatorDescription());

     try {
         builder.add( AnalysisEngineFactory.createEngineDescription(
DefaultJCasTermAnnotator.class,
              AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
              "org.apache.ctakes.typesystem.type.textspan.Sentence",
              JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
              ExternalResourceFactory.createExternalResourceDescription(
                    FileResourceImpl.class,
                    FileLocator.locateFile(
"org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml" )
                    )
        ) );
     } catch ( FileNotFoundException e ) {
        e.printStackTrace();
        throw new ResourceInitializationException( e );
     }

   return builder.createAggregateDescription();
 }


and next I am calling this fuction from here......



 reader = new CSVReader(new FileReader(ExelReadJava.NarrativeFile));
 String [] nextLine;
 int lineNumber = 0;


 while ((nextLine = reader.readNext()) != null) {
   lineNumber++;
   System.out.println("Line # " + lineNumber);

    //UML code start
      try {
if(nextLine[4].length()>1 ){

final JCas jcas = JCasFactory.createJCas();
jcas.setDocumentText( nextLine[4] );
SimplePipeline.runPipeline(jcas, pipelineTesting.getUMLPipeline());

for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
IdentifiedAnnotation.class ) ) {
     if(entity.getOntologyConceptArr() != null){

    add.append(entity.getCoveredText()+ ",");
     }
}


this function working properly , but processing time one line per 40sec,
how can decrease the processing time .

i have 1lakh records(lines) in a csv file.

please give me a solution and example......





regards,
shyam k.

On Thu, Jan 12, 2017 at 8:48 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Shyam,
>
> Have a look at the LinesFromFileCollectionReader class in ctakes-core.  It
> doesn't use csv files, but instead treats every newline character as a
> separator.
>
> Sean
>
> -----Original Message-----
> From: Ks Sunder [mailto:shyam769@gmail.com]
> Sent: Wednesday, January 11, 2017 1:29 AM
> To: dev@ctakes.apache.org
> Subject: Re: Allergy Annotator
>
> Hi All,
>
> my scenario is, read the string content from csv file, and find out
> medical terms from that content using cTakes UML.
>
> as per your suggestion i try to find CollectionReader in ctakes-core, but
> i didnt get clear solution, please give valuable solution, and one example.
>
>
> regards,
> shyam k.
>
> On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Shyam,
> >
> > I think that the key to your first question
> > >   how can execute the single function to run all this jobs in short
> > time...
> > Is in your code here:
> >
> > 1       final JCas jcas = JCasFactory.createJCas();
> > 2       jcas.setDocumentText( nextLine[0] );
> > 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
> >
> > What you probably want to do is replace lines #1 and #2 with a
> > CollectionReader, and then in #3 use a different SimplePipeline call
> > that runs the pipeline using the CollectionReader instead of a static
> cas.
> >
> > There are commonly used CollectionReaders in ctakes-core.  The most
> > widely applicable is probably the FileTreeReader*, which reads a tree
> > of ascii files.  If you have some other source of text data then look
> > around the code for something that might fit and let the devlist know
> > if you can't find anything that fits your needs.
> >
> > I don't understand your second question:
> > > how can i find sentence vised Dictionary words from string, give me
> > > a
> > solution for this..
> > Can you rephrase it and post to the devlist again?
> >
> > * one advantage that the FileTreeReader has is that it stores metadata
> > on the input file tree placement, which can then be reproduced by
> > output file writers like the html writer.
> >
> > Sean
> >
> >
> > -----Original Message-----
> > From: Ks Sunder [mailto:shyam769@gmail.com]
> > Sent: Thursday, December 22, 2016 2:33 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: Allergy Annotator
> >
> > Hi All,
> >
> > I have done the below code for finding medical terms from String
> > information.
> >
> > step 1 :
> > public static AnalysisEngineDescription getUMLPipeline() throws
> > ResourceInitializationException, URISyntaxException{
> >    AggregateBuilder builder = new AggregateBuilder();
> >    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
> >    builder.add(SentenceDetector.createAnnotatorDescription());
> >    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
> >    builder.add(POSTagger.createAnnotatorDescription());
> >    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
> >    builder.add(LvgAnnotator.createAnnotatorDescription());
> >
> >      try {
> >          builder.add( AnalysisEngineFactory.createEngineDescription(
> > DefaultJCasTermAnnotator.class,
> >               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
> >               "org.apache.ctakes.typesystem.type.textspan.Sentence",
> >               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
> >               ExternalResourceFactory.createExternalResourceDescription(
> >                     FileResourceImpl.class,
> >                     FileLocator.locateFile(
> "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> > ) )
> >         ) );
> >      } catch ( FileNotFoundException e ) {
> >         e.printStackTrace();
> >         throw new ResourceInitializationException( e );
> >      }
> >
> >    return builder.createAggregateDescription();
> >  }
> > step 2:
> >
> > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText(
> > nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
> >
> > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
> > IdentifiedAnnotation.class ) ) {
> >
> >          if(entity.getOntologyConceptArr() != null){
> >
> >         add.append(entity.getCoveredText()+ ",");
> >
> >          }
> > }
> >
> >
> >
> >
> >
> > its working Fine..
> >
> > But i have two quires..
> >
> > 1. step1 , i am using Annotator step by step ... that time its taking
> > more time load the all fuctions
> >    how can execute the single function to run all this jobs in short
> > time...
> >
> > 2. how can i find sentence vised Dictionary words from string, give me
> > a solution for this..
> >
> >
> > ...please give me a solutions for this issues....
> >
> >
> >
> > regards,
> > shyam k.
> >
> > On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS <
> > SPM9R@hscmail.mcc.virginia.edu> wrote:
> >
> > > I'm reviving this thread with reference to negation detection. I
> > > previously posted about this to the User list but this is probably a
> > > more appropriate venue.
> > >
> > > The way the sentences are split on ":" makes the negation annotator
> > > miss negation in lists of this form:
> > >
> > > Hyperlipidemia:  Yes
> > > Hypercholesterolemia:  No
> > > Chronic Renal Insufficiency:  N/A
> > >
> > > I tried reversing order and removing ":"s and found that the
> > > negation for Hypercholesterolemia is detected when in this form:
> > >
> > > Yes Hyperlipidemia
> > > No Hypercholesterolemia
> > > N/A Chronic Renal Insufficiency
> > >
> > > Our notes have quite a few places with this sort of list where good
> > > negation detection is important but I haven't very good results. The
> > > sentence segmentator sees this as 12 separate sentences, but I would
> > > think proper behavior would be to consider this as 6 sentences
> > > (breaking sentences on line break but not on colons). I see previous
> > > discussion on the list about the sentence segmentator breaking on
> > > newlines but little regarding colons. I would think in most cases it
> > > would be more useful not to break on ":". Or is there an overriding
> > reason for the current behavior?
> > > If changing the sentence segmentator isn't an option is there a
> > > different way to configure the negation detection annotator that
> > > would avoid this issue?
> > >
> > > Thanks,
> > > Sean
> > >
> > >
> > >
> > > Hi,
> > >
> > > I am interested in the design decision of the sentence detector.
> > >
> > > Why does it split a sentence of the form "WORD1: WORD2 WORD3." into
> > > two sentences "WORD1:" and "WORD2 WORD3."? Do other components of
> > > cTAKES require such a sentence splitting?
> > >
> > > It would seem to me that it should remain one sentence. For example,
> > > the smoking status detector has its own SentenceAdjuster that merges
> > > some of such sentences back into one, because of this design.
> > >
> > > Thanks, Tomasz
> > >
> > > ________________________________________ From: Finan, Sean [
> > > Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To:
> > > de...@ctakes.apache.org Subject: RE: Allergy Annotator
> > >
> > > Hi Tom,
> > >
> > > It is exactly because the sentence detector splits "KEY:" from "VALUE"
> > > that I
> > > didn't suggest using sentences. Instead, I would just iterate over
> > > the whole cas collection of medication events and attempt to match
> > > allergy phrases ("allergic to medication") with text the note
> > > spanning from
> > > event.begin-15 to
> > > event.end+15 or whatever window size you prefer.
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Tom Devel
> > > [mailto:deve...@gmail.com]
> > > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org
> Subject:
> > > Re: Allergy Annotator
> > >
> > > Sean and Dima, these are great suggestions, thanks so far.
> > >
> > > Sean, when looping over medication events as you say, I can see how
> > > it is possible to take the textspan.Sentence of this
> > > MedicationMention, and then do a regex check for the phrase structure
> as Dima said.
> > >
> > > But instead of textspan.Sentence, you mention "see any is included
> > > in a phrase".
> > > What cTAKES/UIMA class is related to this?
> > >
> > > Because if I would use textspan.Sentence, it would work for "The
> > > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> > PENICILLIN, WHEAT"
> > > into two sentences, so that the MedicationMentions here would not be
> > > in the same sentence as the word "ALLERGIES".
> > >
> > > Thanks again, Tom
> > >
> > > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean <
> > > Sean...@childrens.harvard.edu>
> > > wrote:
> > >
> > > Hi Dima, Tom,
> > >
> > > I was thinking the same as Dima's first solution. Iterate through
> > > the medication events and see any is included in a phrase as
> > > mentioned in Tom's original email. Each phrase structure would have
> > > to be specified beforehand. However, assigning appropriate CUIs
> > > would require having a lookup table for each medication allergy. I
> > > think that would be the simplest solution.
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > > Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To:
> > > cTAKES Developer list Subject: Re: Allergy Annotator
> > >
> > > Hi Tom,
> > >
> > > If the patters are pretty simple, you could just add a few rules on
> > > top of the cTAKES dictionary lookup output. Something of the kind
> > > "allergic to <medication>" or "allergies: <medication1>,
> > > <medication2>, <substance1>, ...".
> > >
> > > If these patterns are hard to express as rules, you should consider
> > > a machine learning based sequence labeling route (e.g. something
> > > similar to the cTAKES chunker).
> > >
> > > Dima
> > >
> > > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and
> > > Harvard Medical School (617) 651-0397
> > >
> > > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> > > deve...@gmail.com>> wrote:
> > >
> > > Sean,
> > >
> > > It would be a wider net, such that if an allergy is mentioned in the
> > > clinical note, this is captured in the corresponding
> > > IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation
> > > class should not be changed with a new attribute, in a separate
> > > allergy annotation).
> > >
> > > This annotator would then have to of course run after the clinical
> > > pipeline has run and discovered all IdentifiedAnnotations.
> > >
> > > I am familiar with writing UIMA/cTAKES annotators, but not sure how
> > > a new ML method could be integrated here for detecting allergies. Do
> > > you have any thoughts about how to approach this in general?
> > >
> > > Thanks, Tom
> > >
> > > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean <
> > > Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e
> > > du>>
> > > wrote:
> > >
> > > Hi Tom,
> > >
> > > Are you interested in catching all allergies or just a few specific
> > > allergies for a study? If you are only concerned with a few then
> > > there is a
> > > (possibly) simple solution. If you are interested in throwing a
> > > wider net then I think that a new module would need to be created;
> > > does anybody reading this have an ML or regex style module?
> > >
> > > Sean
> > >
> > > -----Original Message----- From: Tom Devel
> > > [mailto:deve...@gmail.com]
> > > Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<
> mailto:
> > > de...@ctakes.apache.org> Subject: Allergy Annotator
> > >
> > > Hi,
> > >
> > > I would like to use/extend cTAKES to detect allergies.
> > >
> > > In the cTAKES publication (2010)
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih
> > > .g
> > > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> > > ZM
> > > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGK
> > > jz
> > > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t
> > > 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a given
> > > medication are handled by setting the negation attribute of that
> > > medication to 'is negated'."
> > >
> > > However, in a post here in 2014 (RE: Allergy Indication) it is said
> > > that cTAKES does not have a module for allergy discovery.
> > >
> > > 1. What is the current status of allergy detection in cTAKES?
> > >
> > > 2. I did some testing, while cTAKES discovers concepts about
> > > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES:
> > > PENICILLIN, WHEAT" or "The patient is allergic to penicillin." does
> > > not give penicillin or wheat annotations allergy status.
> > >
> > > How would I go about detecting these allergy mentions?
> > >
> > > Thanks, Tom
> > >
> > >
> >
>

RE: Allergy Annotator

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Shyam,

Have a look at the LinesFromFileCollectionReader class in ctakes-core.  It doesn't use csv files, but instead treats every newline character as a separator.

Sean

-----Original Message-----
From: Ks Sunder [mailto:shyam769@gmail.com] 
Sent: Wednesday, January 11, 2017 1:29 AM
To: dev@ctakes.apache.org
Subject: Re: Allergy Annotator

Hi All,

my scenario is, read the string content from csv file, and find out medical terms from that content using cTakes UML.

as per your suggestion i try to find CollectionReader in ctakes-core, but i didnt get clear solution, please give valuable solution, and one example.


regards,
shyam k.

On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi Shyam,
>
> I think that the key to your first question
> >   how can execute the single function to run all this jobs in short
> time...
> Is in your code here:
>
> 1       final JCas jcas = JCasFactory.createJCas();
> 2       jcas.setDocumentText( nextLine[0] );
> 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
>
> What you probably want to do is replace lines #1 and #2 with a 
> CollectionReader, and then in #3 use a different SimplePipeline call 
> that runs the pipeline using the CollectionReader instead of a static cas.
>
> There are commonly used CollectionReaders in ctakes-core.  The most 
> widely applicable is probably the FileTreeReader*, which reads a tree 
> of ascii files.  If you have some other source of text data then look 
> around the code for something that might fit and let the devlist know 
> if you can't find anything that fits your needs.
>
> I don't understand your second question:
> > how can i find sentence vised Dictionary words from string, give me 
> > a
> solution for this..
> Can you rephrase it and post to the devlist again?
>
> * one advantage that the FileTreeReader has is that it stores metadata 
> on the input file tree placement, which can then be reproduced by 
> output file writers like the html writer.
>
> Sean
>
>
> -----Original Message-----
> From: Ks Sunder [mailto:shyam769@gmail.com]
> Sent: Thursday, December 22, 2016 2:33 AM
> To: dev@ctakes.apache.org
> Subject: Re: Allergy Annotator
>
> Hi All,
>
> I have done the below code for finding medical terms from String 
> information.
>
> step 1 :
> public static AnalysisEngineDescription getUMLPipeline() throws 
> ResourceInitializationException, URISyntaxException{
>    AggregateBuilder builder = new AggregateBuilder();
>    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
>    builder.add(SentenceDetector.createAnnotatorDescription());
>    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
>    builder.add(POSTagger.createAnnotatorDescription());
>    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
>    builder.add(LvgAnnotator.createAnnotatorDescription());
>
>      try {
>          builder.add( AnalysisEngineFactory.createEngineDescription(
> DefaultJCasTermAnnotator.class,
>               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
>               "org.apache.ctakes.typesystem.type.textspan.Sentence",
>               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
>               ExternalResourceFactory.createExternalResourceDescription(
>                     FileResourceImpl.class,
>                     FileLocator.locateFile( "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> ) )
>         ) );
>      } catch ( FileNotFoundException e ) {
>         e.printStackTrace();
>         throw new ResourceInitializationException( e );
>      }
>
>    return builder.createAggregateDescription();
>  }
> step 2:
>
> final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( 
> nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
>
> for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, 
> IdentifiedAnnotation.class ) ) {
>
>          if(entity.getOntologyConceptArr() != null){
>
>         add.append(entity.getCoveredText()+ ",");
>
>          }
> }
>
>
>
>
>
> its working Fine..
>
> But i have two quires..
>
> 1. step1 , i am using Annotator step by step ... that time its taking 
> more time load the all fuctions
>    how can execute the single function to run all this jobs in short 
> time...
>
> 2. how can i find sentence vised Dictionary words from string, give me 
> a solution for this..
>
>
> ...please give me a solutions for this issues....
>
>
>
> regards,
> shyam k.
>
> On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS < 
> SPM9R@hscmail.mcc.virginia.edu> wrote:
>
> > I'm reviving this thread with reference to negation detection. I 
> > previously posted about this to the User list but this is probably a 
> > more appropriate venue.
> >
> > The way the sentences are split on ":" makes the negation annotator 
> > miss negation in lists of this form:
> >
> > Hyperlipidemia:  Yes
> > Hypercholesterolemia:  No
> > Chronic Renal Insufficiency:  N/A
> >
> > I tried reversing order and removing ":"s and found that the 
> > negation for Hypercholesterolemia is detected when in this form:
> >
> > Yes Hyperlipidemia
> > No Hypercholesterolemia
> > N/A Chronic Renal Insufficiency
> >
> > Our notes have quite a few places with this sort of list where good 
> > negation detection is important but I haven't very good results. The 
> > sentence segmentator sees this as 12 separate sentences, but I would 
> > think proper behavior would be to consider this as 6 sentences 
> > (breaking sentences on line break but not on colons). I see previous 
> > discussion on the list about the sentence segmentator breaking on 
> > newlines but little regarding colons. I would think in most cases it 
> > would be more useful not to break on ":". Or is there an overriding
> reason for the current behavior?
> > If changing the sentence segmentator isn't an option is there a 
> > different way to configure the negation detection annotator that 
> > would avoid this issue?
> >
> > Thanks,
> > Sean
> >
> >
> >
> > Hi,
> >
> > I am interested in the design decision of the sentence detector.
> >
> > Why does it split a sentence of the form "WORD1: WORD2 WORD3." into 
> > two sentences "WORD1:" and "WORD2 WORD3."? Do other components of 
> > cTAKES require such a sentence splitting?
> >
> > It would seem to me that it should remain one sentence. For example, 
> > the smoking status detector has its own SentenceAdjuster that merges 
> > some of such sentences back into one, because of this design.
> >
> > Thanks, Tomasz
> >
> > ________________________________________ From: Finan, Sean [ 
> > Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To:
> > de...@ctakes.apache.org Subject: RE: Allergy Annotator
> >
> > Hi Tom,
> >
> > It is exactly because the sentence detector splits "KEY:" from "VALUE"
> > that I
> > didn't suggest using sentences. Instead, I would just iterate over 
> > the whole cas collection of medication events and attempt to match 
> > allergy phrases ("allergic to medication") with text the note 
> > spanning from
> > event.begin-15 to
> > event.end+15 or whatever window size you prefer.
> >
> > Sean
> >
> > -----Original Message----- From: Tom Devel 
> > [mailto:deve...@gmail.com]
> > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject:
> > Re: Allergy Annotator
> >
> > Sean and Dima, these are great suggestions, thanks so far.
> >
> > Sean, when looping over medication events as you say, I can see how 
> > it is possible to take the textspan.Sentence of this 
> > MedicationMention, and then do a regex check for the phrase structure as Dima said.
> >
> > But instead of textspan.Sentence, you mention "see any is included 
> > in a phrase".
> > What cTAKES/UIMA class is related to this?
> >
> > Because if I would use textspan.Sentence, it would work for "The 
> > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> PENICILLIN, WHEAT"
> > into two sentences, so that the MedicationMentions here would not be 
> > in the same sentence as the word "ALLERGIES".
> >
> > Thanks again, Tom
> >
> > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < 
> > Sean...@childrens.harvard.edu>
> > wrote:
> >
> > Hi Dima, Tom,
> >
> > I was thinking the same as Dima's first solution. Iterate through 
> > the medication events and see any is included in a phrase as 
> > mentioned in Tom's original email. Each phrase structure would have 
> > to be specified beforehand. However, assigning appropriate CUIs 
> > would require having a lookup table for each medication allergy. I 
> > think that would be the simplest solution.
> >
> > Sean
> >
> > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To:
> > cTAKES Developer list Subject: Re: Allergy Annotator
> >
> > Hi Tom,
> >
> > If the patters are pretty simple, you could just add a few rules on 
> > top of the cTAKES dictionary lookup output. Something of the kind 
> > "allergic to <medication>" or "allergies: <medication1>, 
> > <medication2>, <substance1>, ...".
> >
> > If these patterns are hard to express as rules, you should consider 
> > a machine learning based sequence labeling route (e.g. something 
> > similar to the cTAKES chunker).
> >
> > Dima
> >
> > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and 
> > Harvard Medical School (617) 651-0397
> >
> > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> > deve...@gmail.com>> wrote:
> >
> > Sean,
> >
> > It would be a wider net, such that if an allergy is mentioned in the 
> > clinical note, this is captured in the corresponding 
> > IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation 
> > class should not be changed with a new attribute, in a separate 
> > allergy annotation).
> >
> > This annotator would then have to of course run after the clinical 
> > pipeline has run and discovered all IdentifiedAnnotations.
> >
> > I am familiar with writing UIMA/cTAKES annotators, but not sure how 
> > a new ML method could be integrated here for detecting allergies. Do 
> > you have any thoughts about how to approach this in general?
> >
> > Thanks, Tom
> >
> > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < 
> > Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e
> > du>>
> > wrote:
> >
> > Hi Tom,
> >
> > Are you interested in catching all allergies or just a few specific 
> > allergies for a study? If you are only concerned with a few then 
> > there is a
> > (possibly) simple solution. If you are interested in throwing a 
> > wider net then I think that a new module would need to be created; 
> > does anybody reading this have an ML or regex style module?
> >
> > Sean
> >
> > -----Original Message----- From: Tom Devel 
> > [mailto:deve...@gmail.com]
> > Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<mailto:
> > de...@ctakes.apache.org> Subject: Allergy Annotator
> >
> > Hi,
> >
> > I would like to use/extend cTAKES to detect allergies.
> >
> > In the cTAKES publication (2010)
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih
> > .g 
> > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14J
> > ZM 
> > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGK
> > jz 
> > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t
> > 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a given 
> > medication are handled by setting the negation attribute of that 
> > medication to 'is negated'."
> >
> > However, in a post here in 2014 (RE: Allergy Indication) it is said 
> > that cTAKES does not have a module for allergy discovery.
> >
> > 1. What is the current status of allergy detection in cTAKES?
> >
> > 2. I did some testing, while cTAKES discovers concepts about 
> > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES: 
> > PENICILLIN, WHEAT" or "The patient is allergic to penicillin." does 
> > not give penicillin or wheat annotations allergy status.
> >
> > How would I go about detecting these allergy mentions?
> >
> > Thanks, Tom
> >
> >
>

Re: Allergy Annotator

Posted by Ks Sunder <sh...@gmail.com>.
Hi All,

my scenario is, read the string content from csv file, and find out medical
terms from that content using cTakes UML.

as per your suggestion i try to find CollectionReader in ctakes-core, but i
didnt get clear solution, please give valuable solution, and one example.


regards,
shyam k.

On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Shyam,
>
> I think that the key to your first question
> >   how can execute the single function to run all this jobs in short
> time...
> Is in your code here:
>
> 1       final JCas jcas = JCasFactory.createJCas();
> 2       jcas.setDocumentText( nextLine[0] );
> 3       SimplePipeline.runPipeline(jcas, getUMLPipeline());
>
> What you probably want to do is replace lines #1 and #2 with a
> CollectionReader, and then in #3 use a different SimplePipeline call that
> runs the pipeline using the CollectionReader instead of a static cas.
>
> There are commonly used CollectionReaders in ctakes-core.  The most widely
> applicable is probably the FileTreeReader*, which reads a tree of ascii
> files.  If you have some other source of text data then look around the
> code for something that might fit and let the devlist know if you can't
> find anything that fits your needs.
>
> I don't understand your second question:
> > how can i find sentence vised Dictionary words from string, give me a
> solution for this..
> Can you rephrase it and post to the devlist again?
>
> * one advantage that the FileTreeReader has is that it stores metadata on
> the input file tree placement, which can then be reproduced by output file
> writers like the html writer.
>
> Sean
>
>
> -----Original Message-----
> From: Ks Sunder [mailto:shyam769@gmail.com]
> Sent: Thursday, December 22, 2016 2:33 AM
> To: dev@ctakes.apache.org
> Subject: Re: Allergy Annotator
>
> Hi All,
>
> I have done the below code for finding medical terms from String
> information.
>
> step 1 :
> public static AnalysisEngineDescription getUMLPipeline() throws
> ResourceInitializationException, URISyntaxException{
>    AggregateBuilder builder = new AggregateBuilder();
>    builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
>    builder.add(SentenceDetector.createAnnotatorDescription());
>    builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
>    builder.add(POSTagger.createAnnotatorDescription());
>    builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
>    builder.add(LvgAnnotator.createAnnotatorDescription());
>
>      try {
>          builder.add( AnalysisEngineFactory.createEngineDescription(
> DefaultJCasTermAnnotator.class,
>               AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
>               "org.apache.ctakes.typesystem.type.textspan.Sentence",
>               JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
>               ExternalResourceFactory.createExternalResourceDescription(
>                     FileResourceImpl.class,
>                     FileLocator.locateFile( "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml"
> ) )
>         ) );
>      } catch ( FileNotFoundException e ) {
>         e.printStackTrace();
>         throw new ResourceInitializationException( e );
>      }
>
>    return builder.createAggregateDescription();
>  }
> step 2:
>
> final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText(
> nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());
>
> for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
> IdentifiedAnnotation.class ) ) {
>
>          if(entity.getOntologyConceptArr() != null){
>
>         add.append(entity.getCoveredText()+ ",");
>
>          }
> }
>
>
>
>
>
> its working Fine..
>
> But i have two quires..
>
> 1. step1 , i am using Annotator step by step ... that time its taking more
> time load the all fuctions
>    how can execute the single function to run all this jobs in short
> time...
>
> 2. how can i find sentence vised Dictionary words from string, give me a
> solution for this..
>
>
> ...please give me a solutions for this issues....
>
>
>
> regards,
> shyam k.
>
> On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS <
> SPM9R@hscmail.mcc.virginia.edu> wrote:
>
> > I'm reviving this thread with reference to negation detection. I
> > previously posted about this to the User list but this is probably a
> > more appropriate venue.
> >
> > The way the sentences are split on ":" makes the negation annotator
> > miss negation in lists of this form:
> >
> > Hyperlipidemia:  Yes
> > Hypercholesterolemia:  No
> > Chronic Renal Insufficiency:  N/A
> >
> > I tried reversing order and removing ":"s and found that the negation
> > for Hypercholesterolemia is detected when in this form:
> >
> > Yes Hyperlipidemia
> > No Hypercholesterolemia
> > N/A Chronic Renal Insufficiency
> >
> > Our notes have quite a few places with this sort of list where good
> > negation detection is important but I haven't very good results. The
> > sentence segmentator sees this as 12 separate sentences, but I would
> > think proper behavior would be to consider this as 6 sentences
> > (breaking sentences on line break but not on colons). I see previous
> > discussion on the list about the sentence segmentator breaking on
> > newlines but little regarding colons. I would think in most cases it
> > would be more useful not to break on ":". Or is there an overriding
> reason for the current behavior?
> > If changing the sentence segmentator isn't an option is there a
> > different way to configure the negation detection annotator that would
> > avoid this issue?
> >
> > Thanks,
> > Sean
> >
> >
> >
> > Hi,
> >
> > I am interested in the design decision of the sentence detector.
> >
> > Why does it split a sentence of the form "WORD1: WORD2 WORD3." into
> > two sentences "WORD1:" and "WORD2 WORD3."? Do other components of
> > cTAKES require such a sentence splitting?
> >
> > It would seem to me that it should remain one sentence. For example,
> > the smoking status detector has its own SentenceAdjuster that merges
> > some of such sentences back into one, because of this design.
> >
> > Thanks, Tomasz
> >
> > ________________________________________ From: Finan, Sean [
> > Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To:
> > de...@ctakes.apache.org Subject: RE: Allergy Annotator
> >
> > Hi Tom,
> >
> > It is exactly because the sentence detector splits "KEY:" from "VALUE"
> > that I
> > didn't suggest using sentences. Instead, I would just iterate over the
> > whole cas collection of medication events and attempt to match allergy
> > phrases ("allergic to medication") with text the note spanning from
> > event.begin-15 to
> > event.end+15 or whatever window size you prefer.
> >
> > Sean
> >
> > -----Original Message----- From: Tom Devel [mailto:deve...@gmail.com]
> > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject:
> > Re: Allergy Annotator
> >
> > Sean and Dima, these are great suggestions, thanks so far.
> >
> > Sean, when looping over medication events as you say, I can see how it
> > is possible to take the textspan.Sentence of this MedicationMention,
> > and then do a regex check for the phrase structure as Dima said.
> >
> > But instead of textspan.Sentence, you mention "see any is included in
> > a phrase".
> > What cTAKES/UIMA class is related to this?
> >
> > Because if I would use textspan.Sentence, it would work for "The
> > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES:
> PENICILLIN, WHEAT"
> > into two sentences, so that the MedicationMentions here would not be
> > in the same sentence as the word "ALLERGIES".
> >
> > Thanks again, Tom
> >
> > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean <
> > Sean...@childrens.harvard.edu>
> > wrote:
> >
> > Hi Dima, Tom,
> >
> > I was thinking the same as Dima's first solution. Iterate through the
> > medication events and see any is included in a phrase as mentioned in
> > Tom's original email. Each phrase structure would have to be specified
> > beforehand. However, assigning appropriate CUIs would require having a
> > lookup table for each medication allergy. I think that would be the
> > simplest solution.
> >
> > Sean
> >
> > -----Original Message----- From: Dligach, Dmitriy [mailto:
> > Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To:
> > cTAKES Developer list Subject: Re: Allergy Annotator
> >
> > Hi Tom,
> >
> > If the patters are pretty simple, you could just add a few rules on
> > top of the cTAKES dictionary lookup output. Something of the kind
> > "allergic to <medication>" or "allergies: <medication1>,
> > <medication2>, <substance1>, ...".
> >
> > If these patterns are hard to express as rules, you should consider a
> > machine learning based sequence labeling route (e.g. something similar
> > to the cTAKES chunker).
> >
> > Dima
> >
> > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and
> > Harvard Medical School (617) 651-0397
> >
> > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> > deve...@gmail.com>> wrote:
> >
> > Sean,
> >
> > It would be a wider net, such that if an allergy is mentioned in the
> > clinical note, this is captured in the corresponding
> > IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation
> > class should not be changed with a new attribute, in a separate
> > allergy annotation).
> >
> > This annotator would then have to of course run after the clinical
> > pipeline has run and discovered all IdentifiedAnnotations.
> >
> > I am familiar with writing UIMA/cTAKES annotators, but not sure how a
> > new ML method could be integrated here for detecting allergies. Do you
> > have any thoughts about how to approach this in general?
> >
> > Thanks, Tom
> >
> > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean <
> > Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e
> > du>>
> > wrote:
> >
> > Hi Tom,
> >
> > Are you interested in catching all allergies or just a few specific
> > allergies for a study? If you are only concerned with a few then there
> > is a
> > (possibly) simple solution. If you are interested in throwing a wider
> > net then I think that a new module would need to be created; does
> > anybody reading this have an ML or regex style module?
> >
> > Sean
> >
> > -----Original Message----- From: Tom Devel [mailto:deve...@gmail.com]
> > Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<mailto:
> > de...@ctakes.apache.org> Subject: Allergy Annotator
> >
> > Hi,
> >
> > I would like to use/extend cTAKES to detect allergies.
> >
> > In the cTAKES publication (2010)
> >
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g
> > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz
> > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E
> > WcvhPYW7Lo&e= there is the mention that: "Allergies to a given
> > medication are handled by setting the negation attribute of that
> > medication to 'is negated'."
> >
> > However, in a post here in 2014 (RE: Allergy Indication) it is said
> > that cTAKES does not have a module for allergy discovery.
> >
> > 1. What is the current status of allergy detection in cTAKES?
> >
> > 2. I did some testing, while cTAKES discovers concepts about allegies
> > ("wheat allergy" is found as C0949570), using "ALLERGIES: PENICILLIN,
> > WHEAT" or "The patient is allergic to penicillin." does not give
> > penicillin or wheat annotations allergy status.
> >
> > How would I go about detecting these allergy mentions?
> >
> > Thanks, Tom
> >
> >
>

RE: Allergy Annotator

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Shyam,

I think that the key to your first question
>   how can execute the single function to run all this jobs in short time...
Is in your code here:

1	final JCas jcas = JCasFactory.createJCas();
2	jcas.setDocumentText( nextLine[0] );
3	SimplePipeline.runPipeline(jcas, getUMLPipeline());

What you probably want to do is replace lines #1 and #2 with a CollectionReader, and then in #3 use a different SimplePipeline call that runs the pipeline using the CollectionReader instead of a static cas.

There are commonly used CollectionReaders in ctakes-core.  The most widely applicable is probably the FileTreeReader*, which reads a tree of ascii files.  If you have some other source of text data then look around the code for something that might fit and let the devlist know if you can't find anything that fits your needs.

I don't understand your second question:
> how can i find sentence vised Dictionary words from string, give me a solution for this..
Can you rephrase it and post to the devlist again? 
 
* one advantage that the FileTreeReader has is that it stores metadata on the input file tree placement, which can then be reproduced by output file writers like the html writer.

Sean


-----Original Message-----
From: Ks Sunder [mailto:shyam769@gmail.com] 
Sent: Thursday, December 22, 2016 2:33 AM
To: dev@ctakes.apache.org
Subject: Re: Allergy Annotator

Hi All,

I have done the below code for finding medical terms from String information.

step 1 :
public static AnalysisEngineDescription getUMLPipeline() throws ResourceInitializationException, URISyntaxException{
   AggregateBuilder builder = new AggregateBuilder();
   builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
   builder.add(SentenceDetector.createAnnotatorDescription());
   builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
   builder.add(POSTagger.createAnnotatorDescription());
   builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
   builder.add(LvgAnnotator.createAnnotatorDescription());

     try {
         builder.add( AnalysisEngineFactory.createEngineDescription(
DefaultJCasTermAnnotator.class,
              AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
              "org.apache.ctakes.typesystem.type.textspan.Sentence",
              JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
              ExternalResourceFactory.createExternalResourceDescription(
                    FileResourceImpl.class,
                    FileLocator.locateFile( "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml" ) )
        ) );
     } catch ( FileNotFoundException e ) {
        e.printStackTrace();
        throw new ResourceInitializationException( e );
     }

   return builder.createAggregateDescription();
 }
step 2:

final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline());

for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, IdentifiedAnnotation.class ) ) {

         if(entity.getOntologyConceptArr() != null){

        add.append(entity.getCoveredText()+ ",");

         }
}





its working Fine..

But i have two quires..

1. step1 , i am using Annotator step by step ... that time its taking more time load the all fuctions
   how can execute the single function to run all this jobs in short time...

2. how can i find sentence vised Dictionary words from string, give me a solution for this..


...please give me a solutions for this issues....



regards,
shyam k.

On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS < SPM9R@hscmail.mcc.virginia.edu> wrote:

> I'm reviving this thread with reference to negation detection. I 
> previously posted about this to the User list but this is probably a 
> more appropriate venue.
>
> The way the sentences are split on ":" makes the negation annotator 
> miss negation in lists of this form:
>
> Hyperlipidemia:  Yes
> Hypercholesterolemia:  No
> Chronic Renal Insufficiency:  N/A
>
> I tried reversing order and removing ":"s and found that the negation 
> for Hypercholesterolemia is detected when in this form:
>
> Yes Hyperlipidemia
> No Hypercholesterolemia
> N/A Chronic Renal Insufficiency
>
> Our notes have quite a few places with this sort of list where good 
> negation detection is important but I haven't very good results. The 
> sentence segmentator sees this as 12 separate sentences, but I would 
> think proper behavior would be to consider this as 6 sentences 
> (breaking sentences on line break but not on colons). I see previous 
> discussion on the list about the sentence segmentator breaking on 
> newlines but little regarding colons. I would think in most cases it 
> would be more useful not to break on ":". Or is there an overriding reason for the current behavior?
> If changing the sentence segmentator isn't an option is there a 
> different way to configure the negation detection annotator that would 
> avoid this issue?
>
> Thanks,
> Sean
>
>
>
> Hi,
>
> I am interested in the design decision of the sentence detector.
>
> Why does it split a sentence of the form "WORD1: WORD2 WORD3." into 
> two sentences "WORD1:" and "WORD2 WORD3."? Do other components of 
> cTAKES require such a sentence splitting?
>
> It would seem to me that it should remain one sentence. For example, 
> the smoking status detector has its own SentenceAdjuster that merges 
> some of such sentences back into one, because of this design.
>
> Thanks, Tomasz
>
> ________________________________________ From: Finan, Sean [ 
> Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To:
> de...@ctakes.apache.org Subject: RE: Allergy Annotator
>
> Hi Tom,
>
> It is exactly because the sentence detector splits "KEY:" from "VALUE"
> that I
> didn't suggest using sentences. Instead, I would just iterate over the 
> whole cas collection of medication events and attempt to match allergy 
> phrases ("allergic to medication") with text the note spanning from 
> event.begin-15 to
> event.end+15 or whatever window size you prefer.
>
> Sean
>
> -----Original Message----- From: Tom Devel [mailto:deve...@gmail.com]
> Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject:
> Re: Allergy Annotator
>
> Sean and Dima, these are great suggestions, thanks so far.
>
> Sean, when looping over medication events as you say, I can see how it 
> is possible to take the textspan.Sentence of this MedicationMention, 
> and then do a regex check for the phrase structure as Dima said.
>
> But instead of textspan.Sentence, you mention "see any is included in 
> a phrase".
> What cTAKES/UIMA class is related to this?
>
> Because if I would use textspan.Sentence, it would work for "The 
> patient is allergic to penicillin.", but cTAKES splits "ALLERGIES: PENICILLIN, WHEAT"
> into two sentences, so that the MedicationMentions here would not be 
> in the same sentence as the word "ALLERGIES".
>
> Thanks again, Tom
>
> On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < 
> Sean...@childrens.harvard.edu>
> wrote:
>
> Hi Dima, Tom,
>
> I was thinking the same as Dima's first solution. Iterate through the 
> medication events and see any is included in a phrase as mentioned in 
> Tom's original email. Each phrase structure would have to be specified 
> beforehand. However, assigning appropriate CUIs would require having a 
> lookup table for each medication allergy. I think that would be the 
> simplest solution.
>
> Sean
>
> -----Original Message----- From: Dligach, Dmitriy [mailto:
> Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To:
> cTAKES Developer list Subject: Re: Allergy Annotator
>
> Hi Tom,
>
> If the patters are pretty simple, you could just add a few rules on 
> top of the cTAKES dictionary lookup output. Something of the kind 
> "allergic to <medication>" or "allergies: <medication1>, 
> <medication2>, <substance1>, ...".
>
> If these patterns are hard to express as rules, you should consider a 
> machine learning based sequence labeling route (e.g. something similar 
> to the cTAKES chunker).
>
> Dima
>
> -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and 
> Harvard Medical School (617) 651-0397
>
> On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> deve...@gmail.com>> wrote:
>
> Sean,
>
> It would be a wider net, such that if an allergy is mentioned in the 
> clinical note, this is captured in the corresponding 
> IdentifiedAnnotation (or alternatively, if the IdentifiedAnnotation 
> class should not be changed with a new attribute, in a separate 
> allergy annotation).
>
> This annotator would then have to of course run after the clinical 
> pipeline has run and discovered all IdentifiedAnnotations.
>
> I am familiar with writing UIMA/cTAKES annotators, but not sure how a 
> new ML method could be integrated here for detecting allergies. Do you 
> have any thoughts about how to approach this in general?
>
> Thanks, Tom
>
> On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < 
> Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e 
> du>>
> wrote:
>
> Hi Tom,
>
> Are you interested in catching all allergies or just a few specific 
> allergies for a study? If you are only concerned with a few then there 
> is a
> (possibly) simple solution. If you are interested in throwing a wider 
> net then I think that a new module would need to be created; does 
> anybody reading this have an ML or regex style module?
>
> Sean
>
> -----Original Message----- From: Tom Devel [mailto:deve...@gmail.com]
> Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<mailto:
> de...@ctakes.apache.org> Subject: Allergy Annotator
>
> Hi,
>
> I would like to use/extend cTAKES to detect allergies.
>
> In the cTAKES publication (2010)
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g
> ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz
> vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E
> WcvhPYW7Lo&e= there is the mention that: "Allergies to a given 
> medication are handled by setting the negation attribute of that 
> medication to 'is negated'."
>
> However, in a post here in 2014 (RE: Allergy Indication) it is said 
> that cTAKES does not have a module for allergy discovery.
>
> 1. What is the current status of allergy detection in cTAKES?
>
> 2. I did some testing, while cTAKES discovers concepts about allegies 
> ("wheat allergy" is found as C0949570), using "ALLERGIES: PENICILLIN, 
> WHEAT" or "The patient is allergic to penicillin." does not give 
> penicillin or wheat annotations allergy status.
>
> How would I go about detecting these allergy mentions?
>
> Thanks, Tom
>
>

Re: Allergy Annotator

Posted by Ks Sunder <sh...@gmail.com>.
Hi All,

I have done the below code for finding medical terms from String
information.

step 1 :
public static AnalysisEngineDescription getUMLPipeline() throws
ResourceInitializationException, URISyntaxException{
   AggregateBuilder builder = new AggregateBuilder();
   builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
   builder.add(SentenceDetector.createAnnotatorDescription());
   builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
   builder.add(POSTagger.createAnnotatorDescription());
   builder.add(ClinicalPipelineFactory.getNpChunkerPipeline());
   builder.add(LvgAnnotator.createAnnotatorDescription());

     try {
         builder.add( AnalysisEngineFactory.createEngineDescription(
DefaultJCasTermAnnotator.class,
              AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP,
              "org.apache.ctakes.typesystem.type.textspan.Sentence",
              JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY,
              ExternalResourceFactory.createExternalResourceDescription(
                    FileResourceImpl.class,
                    FileLocator.locateFile(
"org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml" ) )
        ) );
     } catch ( FileNotFoundException e ) {
        e.printStackTrace();
        throw new ResourceInitializationException( e );
     }

   return builder.createAggregateDescription();
 }
step 2:

final JCas jcas = JCasFactory.createJCas();
jcas.setDocumentText( nextLine[0] );
SimplePipeline.runPipeline(jcas, getUMLPipeline());

for ( IdentifiedAnnotation entity : JCasUtil.select( jcas,
IdentifiedAnnotation.class ) ) {

         if(entity.getOntologyConceptArr() != null){

        add.append(entity.getCoveredText()+ ",");

         }
}





its working Fine..

But i have two quires..

1. step1 , i am using Annotator step by step ... that time its taking more
time load the all fuctions
   how can execute the single function to run all this jobs in short
time...

2. how can i find sentence vised Dictionary words from string, give me a
solution for this..


...please give me a solutions for this issues....



regards,
shyam k.

On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS <
SPM9R@hscmail.mcc.virginia.edu> wrote:

> I'm reviving this thread with reference to negation detection. I
> previously posted about this to the User list but this is probably a more
> appropriate venue.
>
> The way the sentences are split on ":" makes the negation annotator miss
> negation in lists of this form:
>
> Hyperlipidemia:  Yes
> Hypercholesterolemia:  No
> Chronic Renal Insufficiency:  N/A
>
> I tried reversing order and removing ":"s and found that the negation for
> Hypercholesterolemia is detected when in this form:
>
> Yes Hyperlipidemia
> No Hypercholesterolemia
> N/A Chronic Renal Insufficiency
>
> Our notes have quite a few places with this sort of list where good
> negation detection is important but I haven't very good results. The
> sentence segmentator sees this as 12 separate sentences, but I would think
> proper behavior would be to consider this as 6 sentences (breaking
> sentences on line break but not on colons). I see previous discussion on
> the list about the sentence segmentator breaking on newlines but little
> regarding colons. I would think in most cases it would be more useful not
> to break on ":". Or is there an overriding reason for the current behavior?
> If changing the sentence segmentator isn't an option is there a different
> way to configure the negation detection annotator that would avoid this
> issue?
>
> Thanks,
> Sean
>
>
>
> Hi,
>
> I am interested in the design decision of the sentence detector.
>
> Why does it split a sentence of the form "WORD1: WORD2 WORD3." into two
> sentences "WORD1:" and "WORD2 WORD3."? Do other components of cTAKES
> require
> such a sentence splitting?
>
> It would seem to me that it should remain one sentence. For example, the
> smoking
> status detector has its own SentenceAdjuster that merges some of such
> sentences
> back into one, because of this design.
>
> Thanks, Tomasz
>
> ________________________________________ From: Finan, Sean [
> Sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM To:
> de...@ctakes.apache.org Subject: RE: Allergy Annotator
>
> Hi Tom,
>
> It is exactly because the sentence detector splits "KEY:" from "VALUE"
> that I
> didn't suggest using sentences. Instead, I would just iterate over the
> whole
> cas collection of medication events and attempt to match allergy phrases
> ("allergic to medication") with text the note spanning from event.begin-15
> to
> event.end+15 or whatever window size you prefer.
>
> Sean
>
> -----Original Message----- From: Tom Devel [mailto:deve...@gmail.com]
> Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org Subject:
> Re: Allergy Annotator
>
> Sean and Dima, these are great suggestions, thanks so far.
>
> Sean, when looping over medication events as you say, I can see how it is
> possible to take the textspan.Sentence of this MedicationMention, and then
> do a
> regex check for the phrase structure as Dima said.
>
> But instead of textspan.Sentence, you mention "see any is included in a
> phrase".
> What cTAKES/UIMA class is related to this?
>
> Because if I would use textspan.Sentence, it would work for "The patient is
> allergic to penicillin.", but cTAKES splits "ALLERGIES: PENICILLIN, WHEAT"
> into two sentences, so that the MedicationMentions here would not be in the
> same
> sentence as the word "ALLERGIES".
>
> Thanks again, Tom
>
> On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean <
> Sean...@childrens.harvard.edu>
> wrote:
>
> Hi Dima, Tom,
>
> I was thinking the same as Dima's first solution. Iterate through the
> medication events and see any is included in a phrase as mentioned in Tom's
> original email. Each phrase structure would have to be specified
> beforehand. However, assigning appropriate CUIs would require having a
> lookup table for each medication allergy. I think that would be the
> simplest solution.
>
> Sean
>
> -----Original Message----- From: Dligach, Dmitriy [mailto:
> Dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM To:
> cTAKES Developer list Subject: Re: Allergy Annotator
>
> Hi Tom,
>
> If the patters are pretty simple, you could just add a few rules on top of
> the cTAKES dictionary lookup output. Something of the kind "allergic to
> <medication>" or "allergies: <medication1>, <medication2>, <substance1>,
> ...".
>
> If these patterns are hard to express as rules, you should consider a
> machine learning based sequence labeling route (e.g. something similar to
> the cTAKES chunker).
>
> Dima
>
> -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and Harvard
> Medical School (617) 651-0397
>
> On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto:
> deve...@gmail.com>> wrote:
>
> Sean,
>
> It would be a wider net, such that if an allergy is mentioned in the
> clinical note, this is captured in the corresponding IdentifiedAnnotation
> (or alternatively, if the IdentifiedAnnotation class should not be changed
> with a new attribute, in a separate allergy
> annotation).
>
> This annotator would then have to of course run after the clinical
> pipeline has run and discovered all IdentifiedAnnotations.
>
> I am familiar with writing UIMA/cTAKES annotators, but not sure how a new
> ML method could be integrated here for detecting allergies. Do you have any
> thoughts about how to approach this in general?
>
> Thanks, Tom
>
> On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean <
> Sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard.e du>>
> wrote:
>
> Hi Tom,
>
> Are you interested in catching all allergies or just a few specific
> allergies for a study? If you are only concerned with a few then there is a
> (possibly) simple solution. If you are interested in throwing a wider net
> then I think that a new module would need to be created; does anybody
> reading this have an ML or regex style module?
>
> Sean
>
> -----Original Message----- From: Tom Devel [mailto:deve...@gmail.com]
> Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org<mailto:
> de...@ctakes.apache.org> Subject: Allergy Annotator
>
> Hi,
>
> I would like to use/extend cTAKES to detect allergies.
>
> In the cTAKES publication (2010)
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.nih.g
> ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZM
> SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJmGKjz
> vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe7t5E
> WcvhPYW7Lo&e= there is the mention that: "Allergies to a given medication
> are handled by setting the negation attribute of that medication to 'is
> negated'."
>
> However, in a post here in 2014 (RE: Allergy Indication) it is said that
> cTAKES does not have a module for allergy discovery.
>
> 1. What is the current status of allergy detection in cTAKES?
>
> 2. I did some testing, while cTAKES discovers concepts about allegies
> ("wheat allergy" is found as C0949570), using "ALLERGIES: PENICILLIN,
> WHEAT" or "The patient is allergic to penicillin." does not give penicillin
> or wheat annotations allergy status.
>
> How would I go about detecting these allergy mentions?
>
> Thanks, Tom
>
>