You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Koji Sekiguchi (JIRA)" <ji...@apache.org> on 2017/11/10 11:59:00 UTC

[jira] [Comment Edited] (OPENNLP-1154) change the XML format for feature generator config in NameFinder

    [ https://issues.apache.org/jira/browse/OPENNLP-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247381#comment-16247381 ] 

Koji Sekiguchi edited comment on OPENNLP-1154 at 11/10/17 11:58 AM:
--------------------------------------------------------------------

I'll post the first patch soon. It fails few tests yet because I didn't care about serialize/deserialize for the new format and other details stuff. The purpose of posting the first patch, before implementing further (serialize/deserialize, test cases, etc.), I'd like to know committers' thought about the new format. And also, I think we can support "classic" format for back-compat reasons, if needed. In the first patch, I did it, but there are many Deprecated annotations due to it. I'd like to know your thought about back-compat support as well.

I don't still understand the versioning system in OpenNLP. If we have this new format in 1.9, don't I need to consider "classic" format?


was (Author: koji):
I'll post the first patch soon. It fails one test yet because I didn't care about serialize/deserialize for the new format. The purpose of posting the first patch, before implementing further (serialize/deserialize, test cases, etc.), I'd like to know committers' thought about the new format. And also, I think we can support "classic" format for back-compat reasons, if needed. In the first patch, I did it, but there are many Deprecated annotations due to it. I'd like to know your thought about back-compat support as well.

I don't still understand the versioning system in OpenNLP. If we have this new format in 1.9, don't I need to consider "classic" format?

> change the XML format for feature generator config in NameFinder
> ----------------------------------------------------------------
>
>                 Key: OPENNLP-1154
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1154
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Name Finder
>    Affects Versions: 1.8.3
>            Reporter: Koji Sekiguchi
>            Assignee: Koji Sekiguchi
>
> NameFinder provides many kinds of feature generator (factories). Users can define their config via XML which looks like:
> {code:xml}
> <generators>
>   <cache> 
>     <generators>
>       <window prevLength = "2" nextLength = "2">          
>         <tokenclass/>
>       </window>
>       <window prevLength = "2" nextLength = "2">                
>         <token/>
>       </window>
>       <definition/>
>       <prevmap/>
>       <bigram/>
>       <sentence begin="true" end="false"/>
>     </generators>
>   </cache> 
> </generators>
> {code}
> If a user wants to implement their own feature generator, he can use <custom .../>, but if he wants to have two or more feature generators at once, he may be able to implement it by providing a wrapper feature generator which wraps two or more feature generators that he originally wants to have, but it is not good.
> I'd like to suggest that we make the config format more flexible like below:
> {code:xml}
> <generator class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>   <args>
>     <generator class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory">
>       <args>
>         <generator class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>           <args>
>             <generator class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>               <args>
>                 <int name="prevLength">2</int>
>                 <int name="nextLength">2</int>
>                 <generator class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
>               </args>
>             </generator>
>             <generator class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>               <args>
>                 <int name="prevLength">2</int>
>                 <int name="nextLength">2</int>
>                 <generator class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/>
>               </args>
>             </generator>
>           </args>
>         </generator>
>       </args>
>     </generator>
>   </args>
> </generator>
> {code}
> If <args>...</args> is too noisy, I'm thinking another format as well:
> {code:xml}
> <generator class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>   <generator class="opennlp.tools.util.featuregen.CachedFeatureGeneratorFactory">
>     <generator class="opennlp.tools.util.featuregen.AggregatedFeatureGeneratorFactory">
>       <generator class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>         <int name="prevLength">2</int>
>         <int name="nextLength">2</int>
>         <generator class="opennlp.tools.util.featuregen.TokenClassFeatureGeneratorFactory"/>
>       </generator>
>       <generator class="opennlp.tools.util.featuregen.WindowFeatureGeneratorFactory">
>         <int name="prevLength">2</int>
>         <int name="nextLength">2</int>
>         <generator class="opennlp.tools.util.featuregen.TokenFeatureGeneratorFactory"/>
>       </generator>
>     </generator>
>   </generator>
> </generator>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)