You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "William Colen (Created) (JIRA)" <ji...@apache.org> on 2012/02/09 14:15:59 UTC

[jira] [Created] (OPENNLP-429) Create a Factory to customize the POS Tagger

Create a Factory to customize the POS Tagger
--------------------------------------------

                 Key: OPENNLP-429
                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
             Project: OpenNLP
          Issue Type: New Feature
          Components: Command Line Interface, POS Tagger
    Affects Versions: tools-1.5.3-incubating
            Reporter: William Colen
            Assignee: William Colen
            Priority: Minor
             Fix For: tools-1.5.3-incubating


Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:

- Context Generator
- Sequence Validator
- POS Dictionary implementation

One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208272#comment-13208272 ] 

Joern Kottmann commented on OPENNLP-429:
----------------------------------------

in case the factory class does not exist, the base model throws this exception:
throw new InvalidFormatException(
            "Could not find the POS factory class: " + factoryName);

That should be more generic. It doesn't need to mention POS factory. We should also extend this message a bit and say that the model cannot load a user extension because it is missing on the classpath.
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208270#comment-13208270 ] 

Joern Kottmann commented on OPENNLP-429:
----------------------------------------

The unknown artifacts might be loaded first from the zip package. I am not sure if it is possible to influence the order here.
But it looks like that your code will load these into memory, and then load the with the proper serializer later. Is that correct?

Why is BaseModel.getFactory public? The BaseModel class is designed for extension. A sub-class will always override it anyway if it supports a custom factory. On the other hand you might don't want to have this method if a sub-class does not over support for a custom factory. Should we remove it?
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "Katrin Tomanek (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208385#comment-13208385 ] 

Katrin Tomanek commented on OPENNLP-429:
----------------------------------------

works for me, now!
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206874#comment-13206874 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

Now I plan to:

- include a getNGramDictionary method to the pos tagger factory
- deprecate from POSTaggerME the train methods that requires things that should be accessed using the factory
- check if old POSTagger model still works
- change the CLI to allow passing a factory class name
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206326#comment-13206326 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

I changed the BaseModel to allow loading artifacts and serializers in two steps.
The first will load basic artifacts and serializers, so we can load the manifest and get the factory name. 
With the factory we can load more serializers, and finally loading more artifacts.
To do that I had to change the BaseModel constructor, moving some of its code to methods that can be called by the sub-class at the right time.

All Model constructors had to be modified to call the post constructor methods.
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206116#comment-13206116 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

I investigated it a little further. I see two options without making the artifact serializers map static:

1) let the subclass of BaseModel fire an action that calls createArtifactSerializers(artifactSerializers) after everything was started properly
2) use the Factory pattern

I prefer the first option because we will not need a lot of refactoring.

Do you have any other suggestion?
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208181#comment-13208181 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

Work should be done now. I think it would be easy to create factories for the other tools.
I am waiting for your comments on how to improve it.
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205198#comment-13205198 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

I commited a proposal for the factory. People that would like to implement another factory should sub-class this one.

Issues: to instantiate the factory we have to suppose that it has a known constructor. I don't know if this is good enough, I can't see another way without creating a sophisticated descriptor or using dependency injection.

Next steps: remove direct references to the default context generator and sequence validator; modify the CLI to accept the factory class name.

                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208348#comment-13208348 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

Jörn,

Yes, it loads the artifact into memory. I could not find a way to get specific entries from ZipInputStream, looks like we can only iterate throw the items. I tried to reset the stream, but depending on how the stream model was created, the reset was not supported.
I suspect that loading artifacts bytes into memory is time consuming, but I will have to check if it is true.

I removed the BaseModel.getFactory() and changed the exception message.

Thank you
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207378#comment-13207378 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

No. The manifest should be loaded first because we need the factory class to load the serializers and finally load the custom artifacts.
The code is loading all known artifacts first, in the BaseModel constructor, and leaving the custom for latter. The POSModel constructor calls a method that will finish loading the artifacts after the custom serializers were loaded.

                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208361#comment-13208361 ] 

Joern Kottmann commented on OPENNLP-429:
----------------------------------------

I doubt that there is a really good solution available. Lets see how it works for our users.

We could do something like this to solve them:
- Compress the stream while writing to memory
- Write to a temp directory
- Reset the stream if supported (maybe that is easy to implement)

Its a nice feature and it will be very useful for me the way it is implemented right now.
Thanks for doing all the work.
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205542#comment-13205542 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

Jörn, I tried it and now I see what you mean.
The BaseModel constructor needs the serializers loaded.
We need another way to do it.

What if we make the artifact serializers map static?
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207369#comment-13207369 ] 

Joern Kottmann commented on OPENNLP-429:
----------------------------------------

Sounds good. I will review the code soon. Sorry I am still a bit behind.

Anyway can it handle this case:
A custom artifact is loaded before the manifest is loaded?
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205292#comment-13205292 ] 

Joern Kottmann commented on OPENNLP-429:
----------------------------------------

Shouldn't we give it a longer name, e.g. POSTaggerFactory?

One of the challenges I see here is that we need to figure out how we can load resources out of a model with a custom serializer class. The problem is that we might need to load the resource into memory before the serializer is loaded. 
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208363#comment-13208363 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

Thank you for testing, Katrin. It should work now.
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Issue Comment Edited) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208181#comment-13208181 ] 

William Colen edited comment on OPENNLP-429 at 2/15/12 1:46 AM:
----------------------------------------------------------------

Work should be finished now. I think it would be easy to create factories for the other tools.
I am waiting for your comments on how to improve it.
                
      was (Author: colen):
    Work should be done now. I think it would be easy to create factories for the other tools.
I am waiting for your comments on how to improve it.
                  
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Closed) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen closed OPENNLP-429.
---------------------------------

    Resolution: Fixed
    
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207500#comment-13207500 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

With my last commit the Factory sub-classes don't need to call any special method from constructor, only the one that checks the artifact map as before.
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205384#comment-13205384 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

I was thinking about it and could not find a simple solution.

I would create the method populateArtifactSerializers(Map<String, ArtifactSerializer>) in the factory, and call it from POSModel.createArtifactSerializers(..).
Another new method is the populateArtifactMap(Map<String, Object>) that would be called from POSModel constructor.
Also a new constructor of the factory would also have an artifactMap as argument to let the factory retrive the resources.

What do you think?

                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "Katrin Tomanek (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208285#comment-13208285 ] 

Katrin Tomanek commented on OPENNLP-429:
----------------------------------------

when I use the trainer tool with this code

POSTaggerTrainerTool trainerTool = new POSTaggerTrainerTool();
trainerTool.run("opennlp",args);


I get the following NullPointerException

Exception in thread "main" java.lang.NullPointerException
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:169)
	at opennlp.tools.util.BaseToolFactory.loadSubclass(BaseToolFactory.java:153)
	at opennlp.tools.postag.POSTaggerFactory.create(POSTaggerFactory.java:203)
	at opennlp.tools.cmdline.postag.POSTaggerTrainerTool.run(POSTaggerTrainerTool.java:100)
	at de.averbis.extraction.ae.pos_tagger_maxent.utils.ModelTrainer.main(ModelTrainer.java:22)


--> this is presumably because I didn't explicitly set a factory; in this case, either the trainer tool needs to be changed (factory as an obligatory parameter) or you need to use a default factory if none is given
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-429) Create a Factory to customize the POS Tagger

Posted by "William Colen (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206930#comment-13206930 ] 

William Colen commented on OPENNLP-429:
---------------------------------------

Also, I am thinking of letting the POSTaggerFactory handle the ngram and postag dictionaries artifacts and serializers.
                
> Create a Factory to customize the POS Tagger
> --------------------------------------------
>
>                 Key: OPENNLP-429
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-429
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Command Line Interface, POS Tagger
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.3-incubating
>
>
> Should provide a mechanism to customize the POS Tagger using a factory. The component should get the following objects from the factory:
> - Context Generator
> - Sequence Validator
> - POS Dictionary implementation
> One issue to solve is how to initialize the objects. For example, the Sequence Validator might be initialized using a POS Dictionary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira