You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Muhammad Ali Syed <ms...@ncsu.edu.INVALID> on 2022/06/02 18:30:48 UTC

SmokingStatus & Side effects - piper file

Hi there,

I am exploring cTakes Smoking Status & Side Effects components and have not
come across any piper file version of their implementation. When trying to
incrementally add AEs to FullTokenizerPipeline.piper from these 2
components I am running into issues such as:
- getting ResourceInitializationExceptions - when adding
ClassifiableEntries (did set UimaDescriptorStep1Key
and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
and ran into other issues
- exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
after adding
KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
libsvm.svm.svm_predict(svm.java:2343)

My question is: can these 2 components (containing multiple AEs) be
implemented by piper files as of now? In other words, can any pipeline,
that can be created using XML descriptor files, be also created by piper
files?

Is there any sample piper code for pipelines that include either of these
components?

Regards,

Re: SmokingStatus & Side effects - piper file [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID>.
There are a few things to talk about, including some bad news.

The bad news:
All 3 of those annotators precede the use of UimaFit: https://uima.apache.org/uimafit.html
Pipers use UimaFit to simplify specification of parameters and configure advances pipelines.  
Piper files will not work with older annotators such as those you wish to utilize.

Some good news is that the problems are in the initialization of those annotators and not processing.
Some refactoring of those annotators to bring them up to date shouldn't be too difficult.

You have exemplified one of the reasons for creating the piper paradigm, which is the simplification of parameter specifications.  There isn't (shouldn't be) any need to specify urls, resources that point to the urls, then parameters that point to resources. 
For instance, a piper would just have:
set StopWordsFile=org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PCSKeyWordFile=org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
set PathOfModel=org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
add PcsClassifierAnnotator_libsvm

Some information on using piper files can be found here: https://cwiki.apache.org/confluence/display/CTAKES/Piper+Files

Sean

________________________________________
From: Muhammad Ali Syed <ms...@ncsu.edu.INVALID>
Sent: Thursday, June 2, 2022 3:22 PM
To: dev@ctakes.apache.org
Subject: Re: SmokingStatus & Side effects - piper file [EXTERNAL]

* External Email - Caution *


Below is my piper file and the sample text I am using is
ctakes-smoking-status/data/test/doc2_07543210_sample_current.txt:
load FullTokenizerPipeline

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe

// Default fast dictionary lookup
load DictionarySubPipe

// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe

// ClassifiableEntries - errors out at
org.apache.ctakes.smokingstatus.ae.ClassifiableEntries.initialize(ClassifiableEntries.java:134)
set SectionsToIgnore=20109,20138
set
AllowedClassifications=SMOKER,CURRENT_SMOKER,NON_SMOKER,PAST_SMOKER,UNKNOWN
set UimaDescriptorStep1=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step1.xml
set UimaDescriptorStep2=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step2_libsvm.xml
add ClassifiableEntries UimaDescriptorStep1Key=UimaDescriptorStep1
UimaDescriptorStep2Key=UimaDescriptorStep2

// KuRuleBasedClassifierAnnotator-  works but commented out for now
//add KuRuleBasedClassifierAnnotator
SmokingWordsFile=/org/apache/ctakes/smokingstatus/data/KU/keywords.txt
UnknownWordsFile=/org/apache/ctakes/smokingstatus/data/KU/unknown_words.txt

// PcsClassifierAnnotator_libsvm - errors out at
libsvm.svm.svm_predict(svm.java:2343)
set StopWordsFileRes=file:
org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PathOfModelRes=file:
org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
set PCSKeyWordFileResc=file:
org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
//add PcsClassifierAnnotator_libsvm PathOfModel=PathOfModelResc
StopWordsFile=StopWordsFileRes PCSKeyWordFile=PCSKeyWordFileResc

// SideEffectAnnotator - errors out
set sideEffectDic=file:
org/apache/ctakes/sideeffect/lookup/sideEffect_dictionary.txt
//add SideEffectAnnotator sideEffectTable=sideEffectDic

addLast util.log.FinishedLogger

On Thu, Jun 2, 2022 at 2:39 PM Finan, Sean
<Se...@childrens.harvard.edu.invalid> wrote:

> Hi Muhammad,
>
> Can you please copy & paste the contents of your piper file?
>
> Thanks,
> Sean
> ________________________________________
> From: Muhammad Ali Syed <ms...@ncsu.edu.INVALID>
> Sent: Thursday, June 2, 2022 2:30 PM
> To: dev@ctakes.apache.org
> Subject: SmokingStatus & Side effects - piper file [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi there,
>
> I am exploring cTakes Smoking Status & Side Effects components and have not
> come across any piper file version of their implementation. When trying to
> incrementally add AEs to FullTokenizerPipeline.piper from these 2
> components I am running into issues such as:
> - getting ResourceInitializationExceptions - when adding
> ClassifiableEntries (did set UimaDescriptorStep1Key
> and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
> and ran into other issues
> - exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
> after adding
> KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
> libsvm.svm.svm_predict(svm.java:2343)
>
> My question is: can these 2 components (containing multiple AEs) be
> implemented by piper files as of now? In other words, can any pipeline,
> that can be created using XML descriptor files, be also created by piper
> files?
>
> Is there any sample piper code for pipelines that include either of these
> components?
>
> Regards,
>

Re: SmokingStatus & Side effects - piper file [EXTERNAL]

Posted by Muhammad Ali Syed <ms...@ncsu.edu.INVALID>.
Below is my piper file and the sample text I am using is
ctakes-smoking-status/data/test/doc2_07543210_sample_current.txt:
load FullTokenizerPipeline

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe

// Default fast dictionary lookup
load DictionarySubPipe

// Add Cleartk Entity Attribute annotators
load AttributeCleartkSubPipe

// ClassifiableEntries - errors out at
org.apache.ctakes.smokingstatus.ae.ClassifiableEntries.initialize(ClassifiableEntries.java:134)
set SectionsToIgnore=20109,20138
set
AllowedClassifications=SMOKER,CURRENT_SMOKER,NON_SMOKER,PAST_SMOKER,UNKNOWN
set UimaDescriptorStep1=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step1.xml
set UimaDescriptorStep2=file:
org/apache/ctakes/smokingstatus/analysis_engine/ProductionPostSentenceAggregate_step2_libsvm.xml
add ClassifiableEntries UimaDescriptorStep1Key=UimaDescriptorStep1
UimaDescriptorStep2Key=UimaDescriptorStep2

// KuRuleBasedClassifierAnnotator-  works but commented out for now
//add KuRuleBasedClassifierAnnotator
SmokingWordsFile=/org/apache/ctakes/smokingstatus/data/KU/keywords.txt
UnknownWordsFile=/org/apache/ctakes/smokingstatus/data/KU/unknown_words.txt

// PcsClassifierAnnotator_libsvm - errors out at
libsvm.svm.svm_predict(svm.java:2343)
set StopWordsFileRes=file:
org/apache/ctakes/smokingstatus/data/PCS/stopwords_PCS.txt
set PathOfModelRes=file:
org/apache/ctakes/smokingstatus/data/PCS/pcs_libsvm-2.91.model
set PCSKeyWordFileResc=file:
org/apache/ctakes/smokingstatus/data/PCS/keywords_PCS.txt
//add PcsClassifierAnnotator_libsvm PathOfModel=PathOfModelResc
StopWordsFile=StopWordsFileRes PCSKeyWordFile=PCSKeyWordFileResc

// SideEffectAnnotator - errors out
set sideEffectDic=file:
org/apache/ctakes/sideeffect/lookup/sideEffect_dictionary.txt
//add SideEffectAnnotator sideEffectTable=sideEffectDic

addLast util.log.FinishedLogger

On Thu, Jun 2, 2022 at 2:39 PM Finan, Sean
<Se...@childrens.harvard.edu.invalid> wrote:

> Hi Muhammad,
>
> Can you please copy & paste the contents of your piper file?
>
> Thanks,
> Sean
> ________________________________________
> From: Muhammad Ali Syed <ms...@ncsu.edu.INVALID>
> Sent: Thursday, June 2, 2022 2:30 PM
> To: dev@ctakes.apache.org
> Subject: SmokingStatus & Side effects - piper file [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi there,
>
> I am exploring cTakes Smoking Status & Side Effects components and have not
> come across any piper file version of their implementation. When trying to
> incrementally add AEs to FullTokenizerPipeline.piper from these 2
> components I am running into issues such as:
> - getting ResourceInitializationExceptions - when adding
> ClassifiableEntries (did set UimaDescriptorStep1Key
> and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
> and ran into other issues
> - exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
> after adding
> KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
> libsvm.svm.svm_predict(svm.java:2343)
>
> My question is: can these 2 components (containing multiple AEs) be
> implemented by piper files as of now? In other words, can any pipeline,
> that can be created using XML descriptor files, be also created by piper
> files?
>
> Is there any sample piper code for pipelines that include either of these
> components?
>
> Regards,
>

Re: SmokingStatus & Side effects - piper file [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu.INVALID>.
Hi Muhammad,

Can you please copy & paste the contents of your piper file?

Thanks,
Sean
________________________________________
From: Muhammad Ali Syed <ms...@ncsu.edu.INVALID>
Sent: Thursday, June 2, 2022 2:30 PM
To: dev@ctakes.apache.org
Subject: SmokingStatus & Side effects - piper file [EXTERNAL]

* External Email - Caution *


Hi there,

I am exploring cTakes Smoking Status & Side Effects components and have not
come across any piper file version of their implementation. When trying to
incrementally add AEs to FullTokenizerPipeline.piper from these 2
components I am running into issues such as:
- getting ResourceInitializationExceptions - when adding
ClassifiableEntries (did set UimaDescriptorStep1Key
and UimaDescriptorStep2Key) I tried add individual AEs from step1 and step2
and ran into other issues
- exceptions such as in PcsClassifierAnnotator_libsvm (added to pipeline
after adding
KuRuleBasedClassifierAnnotator): java.lang.NullPointerException at
libsvm.svm.svm_predict(svm.java:2343)

My question is: can these 2 components (containing multiple AEs) be
implemented by piper files as of now? In other words, can any pipeline,
that can be created using XML descriptor files, be also created by piper
files?

Is there any sample piper code for pipelines that include either of these
components?

Regards,