You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Maite Meseure Hugues <me...@gmail.com> on 2015/07/24 16:50:09 UTC

Annotator POSTagger.xml

Hi everyone,

I explored the POS tagger component guide and the readme file
which both describe the annotator called POSTagger.xml. It looks like it
should have 3 parameters:
PosModelFile, TagDictionary and CaseSensitive.

This description matches with POSTagger.xml under ctakes-chunker/desc,
but POSTagger.xml under ctakes-pos-tagger/desc has only the first parameter,
( this last directory is used in AggregatePlaintextUmlsProcessor.xml and
AggregatePlaintextFastUmlsProcessor.xml ).

Does this make a difference when running the pipeline?

Thank you for your time,

Maite

Re: Annotator POSTagger.xml

Posted by Maite Meseure Hugues <me...@gmail.com>.
Pei, thank you for your reply.

Indeed, the POSTagger class contains only the reference to the PosModelFile
whose the default value is:

"org/apache/ctakes/postagger/models/mayo-pos.zip".




On Fri, Jul 24, 2015 at 9:57 AM, Chen, Pei <Pe...@childrens.harvard.edu>
wrote:

> Matie,
> That looks to be a discrepancy.
> My suggestion would be to remove: POSTagger.xml from the Chunker project
> and anywhere else as it is confusing.  (I think these 'mini' pipelines were
> there when we supported those PEAR file deployments)
> Would you mind double checking to see what the defaults are for those
> parameters?  If memory serves me correctly, I don't think TagDictionary is
> used anymore when we upgraded to the latest version of OpenNLP and it's
> most likely that some old descriptors were not updated.
> Feel free to create a Jira to track it.
>
> -----Original Message-----
> From: Maite Meseure Hugues [mailto:meseure.maite@gmail.com]
> Sent: Friday, July 24, 2015 10:50 AM
> To: dev@ctakes.apache.org
> Subject: Annotator POSTagger.xml
>
> Hi everyone,
>
> I explored the POS tagger component guide and the readme file which both
> describe the annotator called POSTagger.xml. It looks like it should have 3
> parameters:
> PosModelFile, TagDictionary and CaseSensitive.
>
> This description matches with POSTagger.xml under ctakes-chunker/desc, but
> POSTagger.xml under ctakes-pos-tagger/desc has only the first parameter, (
> this last directory is used in AggregatePlaintextUmlsProcessor.xml and
> AggregatePlaintextFastUmlsProcessor.xml ).
>
> Does this make a difference when running the pipeline?
>
> Thank you for your time,
>
> Maite
>

RE: Annotator POSTagger.xml

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Matie,
That looks to be a discrepancy.
My suggestion would be to remove: POSTagger.xml from the Chunker project and anywhere else as it is confusing.  (I think these 'mini' pipelines were there when we supported those PEAR file deployments)
Would you mind double checking to see what the defaults are for those parameters?  If memory serves me correctly, I don't think TagDictionary is used anymore when we upgraded to the latest version of OpenNLP and it's most likely that some old descriptors were not updated.
Feel free to create a Jira to track it.

-----Original Message-----
From: Maite Meseure Hugues [mailto:meseure.maite@gmail.com] 
Sent: Friday, July 24, 2015 10:50 AM
To: dev@ctakes.apache.org
Subject: Annotator POSTagger.xml

Hi everyone,

I explored the POS tagger component guide and the readme file which both describe the annotator called POSTagger.xml. It looks like it should have 3 parameters:
PosModelFile, TagDictionary and CaseSensitive.

This description matches with POSTagger.xml under ctakes-chunker/desc, but POSTagger.xml under ctakes-pos-tagger/desc has only the first parameter, ( this last directory is used in AggregatePlaintextUmlsProcessor.xml and AggregatePlaintextFastUmlsProcessor.xml ).

Does this make a difference when running the pipeline?

Thank you for your time,

Maite