You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Tom Devel <de...@gmail.com> on 2015/04/17 01:29:38 UTC

Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml

Hi,

I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it works
fine, I can see the smoking status annotation in the CVD.

Now I would like to include the smoking status detection in the clinical
pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run the
clinincal pipeline, the smoking status will also be determined.

How can I do this?

I am thinking to just put the nodes from the fixed flow of
SimulatedProdSmokingTAE.xml into the fixed flow of
AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?

If so, at which exact place in the clinical pipeline fixed flow should
these nodes be added?

Is there a preferred place (such as append after the last node or put
before the first node) ?

Can a wrong position or ordering of the smoking status nodes damage/corrupt
the rest of the annotations?

SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:

<fixedFlow>
<node>ExternalBaseAggregateTAE</node>
<node>SentenceAdjuster</node>
<node>ClassifiableEntriesAnnotator</node>
</fixedFlow>

AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this
fixed flow:

<fixedFlow>
<node>SimpleSegmentAnnotator</node>
<node>SentenceDetectorAnnotator</node>
<node>TokenizerAnnotator</node>
<node>LvgAnnotator</node>
<node>ContextDependentTokenizerAnnotator</node>
<node>POSTagger</node>
<!-- <node>ClearPOSTagger</node>  -->
<node>Chunker</node>
<node>AdjustNounPhraseToIncludeFollowingNP</node>
<node>AdjustNounPhraseToIncludeFollowingPPNP</node>
<!--<node>LookupWindowAnnotator</node>-->
<node>DictionaryLookupAnnotatorDB</node>
<node>DrugNER</node>
<node>DependencyParser</node>
<node>SemanticRoleLabeler</node>
<node>ConstituencyParser</node>
<!-- <node>AssertionAnnotator</node> -->
<!-- <node>StatusAnnotator</node> -->
<!-- <node>NegationAnnotator</node> -->
<node>GenericCleartkAnalysisEngine</node>
<node>HistoryCleartkAnalysisEngine</node>
<node>PolarityCleartkAnalysisEngine</node>
<node>SubjectCleartkAnalysisEngine</node>
<node>UncertaintyCleartkAnalysisEngine</node>

<node>ExtractionPrepAnnotator</node>
</fixedFlow>

Thanks for any help or pointers,

Tom

Re: Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml

Posted by Pei Chen <ch...@apache.org>.
If it works for you, I would keep it in there then.  Leave the info in the
Jira and we should double check the code that piece of negation is only
used for the smoking status types.
--Pei

On Tue, Apr 21, 2015 at 1:04 PM, Tom Devel <de...@gmail.com> wrote:

> After further testing, removing the <node>NegationAnnotator</node> step in
>
> ProductionPostSentenceAggregate_step2_libsvm.xml (which I assume is the sub
> smoking desc xml you mean), the smoking status is not correctly classified
> anymore when negations are there, so this step does not look redundant to
> me.
>
>
> For example, "He denied use of tobacco" is then classified as
> CURRENT_SMOKER. If I leave this negation step in, it is correctly found as
> NON_SMOKER.
>
>
> I tried changing the order in which the smoking status nodes
> <node>SentenceAdjuster</node> and <node>ClassifiableEntriesAnnotator</node>
> are run in the clinical pipeline, putting them directly after lvg or at the
> end of the flow does not change the observation above.
>
>
> However, you said that leaving the NegationAnnotator in could overwrite
> assertion values, how can this be prevented while keeping correct smoking
> status classifications?
>
> On Mon, Apr 20, 2015 at 2:02 PM, Chen, Pei <Pei.Chen@childrens.harvard.edu
> >
> wrote:
>
> > Great. There is a redundant Negation step in one of final sub smoking
> desc
> > xml's.
> > Leave the Jira as a placeholder to clean up the smoking status desc's.
> >
> > Sent from my iPhone
> >
> > > On Apr 20, 2015, at 1:11 PM, Tom Devel <de...@gmail.com> wrote:
> > >
> > > Pei,
> > >
> > > I did what you recommended, I run a test input with this new pipeline
> and
> > > did a diff with the clinical pipeline without the smoking status on the
> > two
> > > CAS files. It seems to do the trick, the Umls concept tags are still
> the
> > > same, and there is now a new tag for the smoking status annotation,
> > great!
> > >
> > > Before I create the Jira item, what do you mean with removing the last
> > > NegEx?
> > >
> > > In AggregatePlaintextFastUMLSProcessor, the node of the
> NegationAnnotator
> > > is commented out:
> > > <!-- <node>NegationAnnotator</node> -->
> > >
> > > Did you mean this node?
> > >
> > > At the top of the file, there is an import for the NegationAnnotator:
> > > <delegateAnalysisEngine key="NegationAnnotator">, but it is not
> commented
> > > out and never run in the fixed flow.
> > >
> > > Am I correct that the negation detection in the clinical pipeline is
> now
> > > performed by PolarityCleartkAnalysisEngine?
> > >
> > > Thanks,
> > > Tom
> > >
> > >> On Sat, Apr 18, 2015 at 12:53 AM, Pei Chen <ch...@apache.org>
> wrote:
> > >>
> > >> Tom,
> > >> I would put it at the end of the pipeline (at a min, it should be
> behind
> > >> sectionizer, sentence, tokenizer, lvg).  I would remove
> > >> ExternalBaseAggregateTAE
> > >> as this simulates the sectionizer, sentence, tokenizer, lvg would
> would
> > be
> > >> redundant.  I would also probably remove the last NegEx which could
> > >> override the assertion values.
> > >>
> > >> Disclaimer: I did not test this yet.  Feel free to open a Jira item if
> > it
> > >> works for you so it can be tracked.  It seems kind of strange to have
> a
> > >> descriptor xml define another xml descriptor to be loaded up via code
> > >> again- I think this could be simplified.
> > >> --Pei
> > >>
> > >>> On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel <de...@gmail.com>
> wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it
> > >> works
> > >>> fine, I can see the smoking status annotation in the CVD.
> > >>>
> > >>> Now I would like to include the smoking status detection in the
> > clinical
> > >>> pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I
> run
> > >> the
> > >>> clinincal pipeline, the smoking status will also be determined.
> > >>>
> > >>> How can I do this?
> > >>>
> > >>> I am thinking to just put the nodes from the fixed flow of
> > >>> SimulatedProdSmokingTAE.xml into the fixed flow of
> > >>> AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?
> > >>>
> > >>> If so, at which exact place in the clinical pipeline fixed flow
> should
> > >>> these nodes be added?
> > >>>
> > >>> Is there a preferred place (such as append after the last node or put
> > >>> before the first node) ?
> > >>>
> > >>> Can a wrong position or ordering of the smoking status nodes
> > >> damage/corrupt
> > >>> the rest of the annotations?
> > >>>
> > >>> SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:
> > >>>
> > >>> <fixedFlow>
> > >>> <node>ExternalBaseAggregateTAE</node>
> > >>> <node>SentenceAdjuster</node>
> > >>> <node>ClassifiableEntriesAnnotator</node>
> > >>> </fixedFlow>
> > >>>
> > >>> AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains
> this
> > >>> fixed flow:
> > >>>
> > >>> <fixedFlow>
> > >>> <node>SimpleSegmentAnnotator</node>
> > >>> <node>SentenceDetectorAnnotator</node>
> > >>> <node>TokenizerAnnotator</node>
> > >>> <node>LvgAnnotator</node>
> > >>> <node>ContextDependentTokenizerAnnotator</node>
> > >>> <node>POSTagger</node>
> > >>> <!-- <node>ClearPOSTagger</node> -->
> > >>> <node>Chunker</node>
> > >>> <node>AdjustNounPhraseToIncludeFollowingNP</node>
> > >>> <node>AdjustNounPhraseToIncludeFollowingPPNP</node>
> > >>> <!--<node>LookupWindowAnnotator</node>-->
> > >>> <node>DictionaryLookupAnnotatorDB</node>
> > >>> <node>DrugNER</node>
> > >>> <node>DependencyParser</node>
> > >>> <node>SemanticRoleLabeler</node>
> > >>> <node>ConstituencyParser</node>
> > >>> <!-- <node>AssertionAnnotator</node> -->
> > >>> <!-- <node>StatusAnnotator</node> -->
> > >>> <!-- <node>NegationAnnotator</node> -->
> > >>> <node>GenericCleartkAnalysisEngine</node>
> > >>> <node>HistoryCleartkAnalysisEngine</node>
> > >>> <node>PolarityCleartkAnalysisEngine</node>
> > >>> <node>SubjectCleartkAnalysisEngine</node>
> > >>> <node>UncertaintyCleartkAnalysisEngine</node>
> > >>>
> > >>> <node>ExtractionPrepAnnotator</node>
> > >>> </fixedFlow>
> > >>>
> > >>> Thanks for any help or pointers,
> > >>>
> > >>> Tom
> > >>
> >
>

Re: Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml

Posted by Tom Devel <de...@gmail.com>.
After further testing, removing the <node>NegationAnnotator</node> step in

ProductionPostSentenceAggregate_step2_libsvm.xml (which I assume is the sub
smoking desc xml you mean), the smoking status is not correctly classified
anymore when negations are there, so this step does not look redundant to
me.


For example, "He denied use of tobacco" is then classified as
CURRENT_SMOKER. If I leave this negation step in, it is correctly found as
NON_SMOKER.


I tried changing the order in which the smoking status nodes
<node>SentenceAdjuster</node> and <node>ClassifiableEntriesAnnotator</node>
are run in the clinical pipeline, putting them directly after lvg or at the
end of the flow does not change the observation above.


However, you said that leaving the NegationAnnotator in could overwrite
assertion values, how can this be prevented while keeping correct smoking
status classifications?

On Mon, Apr 20, 2015 at 2:02 PM, Chen, Pei <Pe...@childrens.harvard.edu>
wrote:

> Great. There is a redundant Negation step in one of final sub smoking desc
> xml's.
> Leave the Jira as a placeholder to clean up the smoking status desc's.
>
> Sent from my iPhone
>
> > On Apr 20, 2015, at 1:11 PM, Tom Devel <de...@gmail.com> wrote:
> >
> > Pei,
> >
> > I did what you recommended, I run a test input with this new pipeline and
> > did a diff with the clinical pipeline without the smoking status on the
> two
> > CAS files. It seems to do the trick, the Umls concept tags are still the
> > same, and there is now a new tag for the smoking status annotation,
> great!
> >
> > Before I create the Jira item, what do you mean with removing the last
> > NegEx?
> >
> > In AggregatePlaintextFastUMLSProcessor, the node of the NegationAnnotator
> > is commented out:
> > <!-- <node>NegationAnnotator</node> -->
> >
> > Did you mean this node?
> >
> > At the top of the file, there is an import for the NegationAnnotator:
> > <delegateAnalysisEngine key="NegationAnnotator">, but it is not commented
> > out and never run in the fixed flow.
> >
> > Am I correct that the negation detection in the clinical pipeline is now
> > performed by PolarityCleartkAnalysisEngine?
> >
> > Thanks,
> > Tom
> >
> >> On Sat, Apr 18, 2015 at 12:53 AM, Pei Chen <ch...@apache.org> wrote:
> >>
> >> Tom,
> >> I would put it at the end of the pipeline (at a min, it should be behind
> >> sectionizer, sentence, tokenizer, lvg).  I would remove
> >> ExternalBaseAggregateTAE
> >> as this simulates the sectionizer, sentence, tokenizer, lvg would would
> be
> >> redundant.  I would also probably remove the last NegEx which could
> >> override the assertion values.
> >>
> >> Disclaimer: I did not test this yet.  Feel free to open a Jira item if
> it
> >> works for you so it can be tracked.  It seems kind of strange to have a
> >> descriptor xml define another xml descriptor to be loaded up via code
> >> again- I think this could be simplified.
> >> --Pei
> >>
> >>> On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel <de...@gmail.com> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it
> >> works
> >>> fine, I can see the smoking status annotation in the CVD.
> >>>
> >>> Now I would like to include the smoking status detection in the
> clinical
> >>> pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run
> >> the
> >>> clinincal pipeline, the smoking status will also be determined.
> >>>
> >>> How can I do this?
> >>>
> >>> I am thinking to just put the nodes from the fixed flow of
> >>> SimulatedProdSmokingTAE.xml into the fixed flow of
> >>> AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?
> >>>
> >>> If so, at which exact place in the clinical pipeline fixed flow should
> >>> these nodes be added?
> >>>
> >>> Is there a preferred place (such as append after the last node or put
> >>> before the first node) ?
> >>>
> >>> Can a wrong position or ordering of the smoking status nodes
> >> damage/corrupt
> >>> the rest of the annotations?
> >>>
> >>> SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:
> >>>
> >>> <fixedFlow>
> >>> <node>ExternalBaseAggregateTAE</node>
> >>> <node>SentenceAdjuster</node>
> >>> <node>ClassifiableEntriesAnnotator</node>
> >>> </fixedFlow>
> >>>
> >>> AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this
> >>> fixed flow:
> >>>
> >>> <fixedFlow>
> >>> <node>SimpleSegmentAnnotator</node>
> >>> <node>SentenceDetectorAnnotator</node>
> >>> <node>TokenizerAnnotator</node>
> >>> <node>LvgAnnotator</node>
> >>> <node>ContextDependentTokenizerAnnotator</node>
> >>> <node>POSTagger</node>
> >>> <!-- <node>ClearPOSTagger</node> -->
> >>> <node>Chunker</node>
> >>> <node>AdjustNounPhraseToIncludeFollowingNP</node>
> >>> <node>AdjustNounPhraseToIncludeFollowingPPNP</node>
> >>> <!--<node>LookupWindowAnnotator</node>-->
> >>> <node>DictionaryLookupAnnotatorDB</node>
> >>> <node>DrugNER</node>
> >>> <node>DependencyParser</node>
> >>> <node>SemanticRoleLabeler</node>
> >>> <node>ConstituencyParser</node>
> >>> <!-- <node>AssertionAnnotator</node> -->
> >>> <!-- <node>StatusAnnotator</node> -->
> >>> <!-- <node>NegationAnnotator</node> -->
> >>> <node>GenericCleartkAnalysisEngine</node>
> >>> <node>HistoryCleartkAnalysisEngine</node>
> >>> <node>PolarityCleartkAnalysisEngine</node>
> >>> <node>SubjectCleartkAnalysisEngine</node>
> >>> <node>UncertaintyCleartkAnalysisEngine</node>
> >>>
> >>> <node>ExtractionPrepAnnotator</node>
> >>> </fixedFlow>
> >>>
> >>> Thanks for any help or pointers,
> >>>
> >>> Tom
> >>
>

Re: Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Great. There is a redundant Negation step in one of final sub smoking desc xml's. 
Leave the Jira as a placeholder to clean up the smoking status desc's.

Sent from my iPhone

> On Apr 20, 2015, at 1:11 PM, Tom Devel <de...@gmail.com> wrote:
> 
> Pei,
> 
> I did what you recommended, I run a test input with this new pipeline and
> did a diff with the clinical pipeline without the smoking status on the two
> CAS files. It seems to do the trick, the Umls concept tags are still the
> same, and there is now a new tag for the smoking status annotation, great!
> 
> Before I create the Jira item, what do you mean with removing the last
> NegEx?
> 
> In AggregatePlaintextFastUMLSProcessor, the node of the NegationAnnotator
> is commented out:
> <!-- <node>NegationAnnotator</node> -->
> 
> Did you mean this node?
> 
> At the top of the file, there is an import for the NegationAnnotator:
> <delegateAnalysisEngine key="NegationAnnotator">, but it is not commented
> out and never run in the fixed flow.
> 
> Am I correct that the negation detection in the clinical pipeline is now
> performed by PolarityCleartkAnalysisEngine?
> 
> Thanks,
> Tom
> 
>> On Sat, Apr 18, 2015 at 12:53 AM, Pei Chen <ch...@apache.org> wrote:
>> 
>> Tom,
>> I would put it at the end of the pipeline (at a min, it should be behind
>> sectionizer, sentence, tokenizer, lvg).  I would remove
>> ExternalBaseAggregateTAE
>> as this simulates the sectionizer, sentence, tokenizer, lvg would would be
>> redundant.  I would also probably remove the last NegEx which could
>> override the assertion values.
>> 
>> Disclaimer: I did not test this yet.  Feel free to open a Jira item if it
>> works for you so it can be tracked.  It seems kind of strange to have a
>> descriptor xml define another xml descriptor to be loaded up via code
>> again- I think this could be simplified.
>> --Pei
>> 
>>> On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel <de...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it
>> works
>>> fine, I can see the smoking status annotation in the CVD.
>>> 
>>> Now I would like to include the smoking status detection in the clinical
>>> pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run
>> the
>>> clinincal pipeline, the smoking status will also be determined.
>>> 
>>> How can I do this?
>>> 
>>> I am thinking to just put the nodes from the fixed flow of
>>> SimulatedProdSmokingTAE.xml into the fixed flow of
>>> AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?
>>> 
>>> If so, at which exact place in the clinical pipeline fixed flow should
>>> these nodes be added?
>>> 
>>> Is there a preferred place (such as append after the last node or put
>>> before the first node) ?
>>> 
>>> Can a wrong position or ordering of the smoking status nodes
>> damage/corrupt
>>> the rest of the annotations?
>>> 
>>> SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:
>>> 
>>> <fixedFlow>
>>> <node>ExternalBaseAggregateTAE</node>
>>> <node>SentenceAdjuster</node>
>>> <node>ClassifiableEntriesAnnotator</node>
>>> </fixedFlow>
>>> 
>>> AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this
>>> fixed flow:
>>> 
>>> <fixedFlow>
>>> <node>SimpleSegmentAnnotator</node>
>>> <node>SentenceDetectorAnnotator</node>
>>> <node>TokenizerAnnotator</node>
>>> <node>LvgAnnotator</node>
>>> <node>ContextDependentTokenizerAnnotator</node>
>>> <node>POSTagger</node>
>>> <!-- <node>ClearPOSTagger</node> -->
>>> <node>Chunker</node>
>>> <node>AdjustNounPhraseToIncludeFollowingNP</node>
>>> <node>AdjustNounPhraseToIncludeFollowingPPNP</node>
>>> <!--<node>LookupWindowAnnotator</node>-->
>>> <node>DictionaryLookupAnnotatorDB</node>
>>> <node>DrugNER</node>
>>> <node>DependencyParser</node>
>>> <node>SemanticRoleLabeler</node>
>>> <node>ConstituencyParser</node>
>>> <!-- <node>AssertionAnnotator</node> -->
>>> <!-- <node>StatusAnnotator</node> -->
>>> <!-- <node>NegationAnnotator</node> -->
>>> <node>GenericCleartkAnalysisEngine</node>
>>> <node>HistoryCleartkAnalysisEngine</node>
>>> <node>PolarityCleartkAnalysisEngine</node>
>>> <node>SubjectCleartkAnalysisEngine</node>
>>> <node>UncertaintyCleartkAnalysisEngine</node>
>>> 
>>> <node>ExtractionPrepAnnotator</node>
>>> </fixedFlow>
>>> 
>>> Thanks for any help or pointers,
>>> 
>>> Tom
>> 

Re: Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml

Posted by Tom Devel <de...@gmail.com>.
Pei,

I did what you recommended, I run a test input with this new pipeline and
did a diff with the clinical pipeline without the smoking status on the two
CAS files. It seems to do the trick, the Umls concept tags are still the
same, and there is now a new tag for the smoking status annotation, great!

Before I create the Jira item, what do you mean with removing the last
NegEx?

In AggregatePlaintextFastUMLSProcessor, the node of the NegationAnnotator
is commented out:
<!-- <node>NegationAnnotator</node> -->

Did you mean this node?

At the top of the file, there is an import for the NegationAnnotator:
<delegateAnalysisEngine key="NegationAnnotator">, but it is not commented
out and never run in the fixed flow.

Am I correct that the negation detection in the clinical pipeline is now
performed by PolarityCleartkAnalysisEngine?

Thanks,
Tom

On Sat, Apr 18, 2015 at 12:53 AM, Pei Chen <ch...@apache.org> wrote:

> Tom,
> I would put it at the end of the pipeline (at a min, it should be behind
> sectionizer, sentence, tokenizer, lvg).  I would remove
> ExternalBaseAggregateTAE
> as this simulates the sectionizer, sentence, tokenizer, lvg would would be
> redundant.  I would also probably remove the last NegEx which could
> override the assertion values.
>
> Disclaimer: I did not test this yet.  Feel free to open a Jira item if it
> works for you so it can be tracked.  It seems kind of strange to have a
> descriptor xml define another xml descriptor to be loaded up via code
> again- I think this could be simplified.
> --Pei
>
> On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel <de...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it
> works
> > fine, I can see the smoking status annotation in the CVD.
> >
> > Now I would like to include the smoking status detection in the clinical
> > pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run
> the
> > clinincal pipeline, the smoking status will also be determined.
> >
> > How can I do this?
> >
> > I am thinking to just put the nodes from the fixed flow of
> > SimulatedProdSmokingTAE.xml into the fixed flow of
> > AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?
> >
> > If so, at which exact place in the clinical pipeline fixed flow should
> > these nodes be added?
> >
> > Is there a preferred place (such as append after the last node or put
> > before the first node) ?
> >
> > Can a wrong position or ordering of the smoking status nodes
> damage/corrupt
> > the rest of the annotations?
> >
> > SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:
> >
> > <fixedFlow>
> > <node>ExternalBaseAggregateTAE</node>
> > <node>SentenceAdjuster</node>
> > <node>ClassifiableEntriesAnnotator</node>
> > </fixedFlow>
> >
> > AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this
> > fixed flow:
> >
> > <fixedFlow>
> > <node>SimpleSegmentAnnotator</node>
> > <node>SentenceDetectorAnnotator</node>
> > <node>TokenizerAnnotator</node>
> > <node>LvgAnnotator</node>
> > <node>ContextDependentTokenizerAnnotator</node>
> > <node>POSTagger</node>
> > <!-- <node>ClearPOSTagger</node>  -->
> > <node>Chunker</node>
> > <node>AdjustNounPhraseToIncludeFollowingNP</node>
> > <node>AdjustNounPhraseToIncludeFollowingPPNP</node>
> > <!--<node>LookupWindowAnnotator</node>-->
> > <node>DictionaryLookupAnnotatorDB</node>
> > <node>DrugNER</node>
> > <node>DependencyParser</node>
> > <node>SemanticRoleLabeler</node>
> > <node>ConstituencyParser</node>
> > <!-- <node>AssertionAnnotator</node> -->
> > <!-- <node>StatusAnnotator</node> -->
> > <!-- <node>NegationAnnotator</node> -->
> > <node>GenericCleartkAnalysisEngine</node>
> > <node>HistoryCleartkAnalysisEngine</node>
> > <node>PolarityCleartkAnalysisEngine</node>
> > <node>SubjectCleartkAnalysisEngine</node>
> > <node>UncertaintyCleartkAnalysisEngine</node>
> >
> > <node>ExtractionPrepAnnotator</node>
> > </fixedFlow>
> >
> > Thanks for any help or pointers,
> >
> > Tom
> >
>

Re: Include the smoking status detection in AggregatePlaintextFastUMLSProcessor.xml

Posted by Pei Chen <ch...@apache.org>.
Tom,
I would put it at the end of the pipeline (at a min, it should be behind
sectionizer, sentence, tokenizer, lvg).  I would remove
ExternalBaseAggregateTAE
as this simulates the sectionizer, sentence, tokenizer, lvg would would be
redundant.  I would also probably remove the last NegEx which could
override the assertion values.

Disclaimer: I did not test this yet.  Feel free to open a Jira item if it
works for you so it can be tracked.  It seems kind of strange to have a
descriptor xml define another xml descriptor to be loaded up via code
again- I think this could be simplified.
--Pei

On Thu, Apr 16, 2015 at 7:29 PM, Tom Devel <de...@gmail.com> wrote:

> Hi,
>
> I am using the smoking status AE from SimulatedProdSmokingTAE.xml, it works
> fine, I can see the smoking status annotation in the CVD.
>
> Now I would like to include the smoking status detection in the clinical
> pipeline of AggregatePlaintextFastUMLSProcessor.xml, so that when I run the
> clinincal pipeline, the smoking status will also be determined.
>
> How can I do this?
>
> I am thinking to just put the nodes from the fixed flow of
> SimulatedProdSmokingTAE.xml into the fixed flow of
> AggregatePlaintextFastUMLSProcessor.xml, is this the right approach?
>
> If so, at which exact place in the clinical pipeline fixed flow should
> these nodes be added?
>
> Is there a preferred place (such as append after the last node or put
> before the first node) ?
>
> Can a wrong position or ordering of the smoking status nodes damage/corrupt
> the rest of the annotations?
>
> SimulatedProdSmokingTAE.xml contains these lines with the fixed flow:
>
> <fixedFlow>
> <node>ExternalBaseAggregateTAE</node>
> <node>SentenceAdjuster</node>
> <node>ClassifiableEntriesAnnotator</node>
> </fixedFlow>
>
> AggregatePlaintextFastUMLSProcessor.xml (3.2.2 from SVN) contains this
> fixed flow:
>
> <fixedFlow>
> <node>SimpleSegmentAnnotator</node>
> <node>SentenceDetectorAnnotator</node>
> <node>TokenizerAnnotator</node>
> <node>LvgAnnotator</node>
> <node>ContextDependentTokenizerAnnotator</node>
> <node>POSTagger</node>
> <!-- <node>ClearPOSTagger</node>  -->
> <node>Chunker</node>
> <node>AdjustNounPhraseToIncludeFollowingNP</node>
> <node>AdjustNounPhraseToIncludeFollowingPPNP</node>
> <!--<node>LookupWindowAnnotator</node>-->
> <node>DictionaryLookupAnnotatorDB</node>
> <node>DrugNER</node>
> <node>DependencyParser</node>
> <node>SemanticRoleLabeler</node>
> <node>ConstituencyParser</node>
> <!-- <node>AssertionAnnotator</node> -->
> <!-- <node>StatusAnnotator</node> -->
> <!-- <node>NegationAnnotator</node> -->
> <node>GenericCleartkAnalysisEngine</node>
> <node>HistoryCleartkAnalysisEngine</node>
> <node>PolarityCleartkAnalysisEngine</node>
> <node>SubjectCleartkAnalysisEngine</node>
> <node>UncertaintyCleartkAnalysisEngine</node>
>
> <node>ExtractionPrepAnnotator</node>
> </fixedFlow>
>
> Thanks for any help or pointers,
>
> Tom
>