You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Tommaso Teofili (JIRA)" <de...@uima.apache.org> on 2011/07/05 11:35:16 UTC
[jira] [Commented] (UIMA-2110) Turn the HMMTagger class into a more
generic class for tagging tasks
[ https://issues.apache.org/jira/browse/UIMA-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059783#comment-13059783 ]
Tommaso Teofili commented on UIMA-2110:
---------------------------------------
I tested it and verified there is no regression when using this new version the "old way" too.
I'm going to commit it shortly.
> Turn the HMMTagger class into a more generic class for tagging tasks
> ----------------------------------------------------------------------
>
> Key: UIMA-2110
> URL: https://issues.apache.org/jira/browse/UIMA-2110
> Project: UIMA
> Issue Type: Improvement
> Components: Sandbox-Tagger
> Affects Versions: 2.3
> Environment: OS
> Linux version 2.6.32-30-generic (buildd@vernadsky) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) ) #59-Ubuntu SMP Tue Mar 1 21:30:21 UTC 2011
> JVM
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
> Reporter: Nicolas Hernandez
> Priority: Minor
> Attachments: AMoreGenericHMMTaggerDesc.patch, AMoreGenericHMMTaggerSrcClass.patch, UIMA2110updated.patch
>
> Original Estimate: 1.5h
> Remaining Estimate: 1.5h
>
> Despite its name, the code of the org.apache.uima.examples.tagger.HMMTagger
> class is not totally independant from the pos tagging task.
> In addition it assumes that the feature path to update with the result of the
> tagging is org.apache.uima.TokenAnnotation:posTag.
> We propose to let the possibility to users to specify by parameter the feature
> path to set. This parameter is optional. If it is left free, the tagger will
> work as usually using the org.apache.uima.TokenAnnotation:posTag as default value.
>
> By the way, we propose to add three optional parameters : InputView, SentenceType and ModelFile.
> Since the HMM Learner has got the possibility to specify the view to use to
> train a model, we consequently decide to give the same possibility for the
> tagger. By default, it works on the _InitialView. It is actually quite useful in practice!
> The org.apache.uima.TokenAnnotation type is not the only annotation type which is assumed
> to be present in the CAS. Actually, the HMMTagger processes tokens sentence by sentence. It uses the
> org.apache.uima.SentenceAnnotation to select the tokens. The SentenceType parameter aims at
> letting the users free to specify their own sentence annotation Type. The default value is
> org.apache.uima.SentenceAnnotation.
> The ModelFile parameter is a concurrent way to the resource declaration way to specify a model.
> Left empty, it won t be considered. Otherwise it will predomine over the resource declaration.
> When specified, the multiple deployement of the tagger cannot be allowed but in practice for the user it may be easier to configure a parameter through Eclipse.
> Two distincts patches will be provided, one for the class and the other for the descriptor.
> Future improvement of the class might offer the possibility to create new annotations not only to update existing ones.
> Future improvement of the descriptor may dissociate what it is up to the tagger and what it is relevant for the pos tagger...
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira