You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Aliaksandr Autayeu (Created) (JIRA)" <ji...@apache.org> on 2011/11/23 00:52:40 UTC

[jira] [Created] (OPENNLP-402) CLI tools and formats refactored

CLI tools and formats refactored
--------------------------------

                 Key: OPENNLP-402
                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
             Project: OpenNLP
          Issue Type: Improvement
          Components: Command Line Interface, Formats
    Affects Versions: tools-1.5.3-incubating
            Reporter: Aliaksandr Autayeu
         Attachments: 0016-CLI-tools-and-formats-refactored.patch

Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann reopened OPENNLP-402:
------------------------------------

    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159959#comment-13159959 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

It never good to mix jiras, it is much easier for everyone to participate if changes are done per jira only. Changes and refactoring which don't fit into the current jira should be done under a new jira (e.g. extending the intro section of the documentation, doing typo fixing and other changes).

I did split up the documentation changes and committed them under the actual jiras as good as possible.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155893#comment-13155893 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

No it does not matter and that is fine. I just though that having the ability to easily make a branch which is publicly available could be very interesting for some things. As a committer I can always do that with subversion, but when I got it right this is possible for everyone with git. And that is why I asked how you are doing it.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161100#comment-13161100 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

Patch is now applied completely.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
The refactoring moved language parameters into factories - it can be better
used there, for example for filtering some multilingual datasets, or to
allow language-specific things. So it became a part of data format
parameters in stream factories. In some cases it is necessary, it others it
might not be. It is always needed for training. It might be needed for some
datasets for evaluation as well. Since the formats now work uniformly, in
some cases it became necessary, because the same factory produces a stream
for testing and for training and thus requires the language
parameter. Factories are not aware of the way the stream will be used.
Please, advise if you know a better solution.

Aliaksandr

On Mon, Dec 5, 2011 at 4:48 AM, James Kosin <ja...@gmail.com> wrote:

> Hmm....
> Any reason why the evaluator needs a -lang parameter as being mandatory
> now?
>
> James
>
> On 12/1/2011 3:30 PM, Aliaksandr Autayeu (Commented) (JIRA) wrote:
> >     [
> https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161118#comment-13161118]
> >
> > Aliaksandr Autayeu commented on OPENNLP-402:
> > --------------------------------------------
> >
> > OK, I'll make another proposal for the class hierarchy in CLI package,
> taking into account current discussion. I'll see other tickets as well.
> >
> >> CLI tools and formats refactored
> >> --------------------------------
> >>
> >>                 Key: OPENNLP-402
> >>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
> >>             Project: OpenNLP
> >>          Issue Type: Improvement
> >>          Components: Command Line Interface, Formats
> >>    Affects Versions: tools-1.5.3-incubating
> >>            Reporter: Aliaksandr Autayeu
> >>            Assignee: Aliaksandr Autayeu
> >>              Labels: patch
> >>         Attachments: 0016-CLI-tools-and-formats-refactored.patch,
> 0017-added-direct-format-support-to-CLI-tools-and-formats.patch,
> open-nlp-cli-package.png
> >>
> >>
> >> Proposed patch refactors CLI tools and simplifies the code by
> introducing hierarchy and removing a lot of code duplication. It also
> introduces better error and help messages, including help for formats and
> listing available formats in various tools, which are now able to work with
> formats directly. This, in turn, eliminates the need to keep converted
> files on disk.
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>
>

Re: [jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by Aliaksandr Autayeu <al...@autayeu.com>.
The refactoring moved language parameters into factories - it can be better
used there, for example for filtering some multilingual datasets, or to
allow language-specific things. So it became a part of data format
parameters in stream factories. In some cases it is necessary, it others it
might not be. It is always needed for training. It might be needed for some
datasets for evaluation as well. Since the formats now work uniformly, in
some cases it became necessary, because the same factory produces a stream
for testing and for training and thus requires the language
parameter. Factories are not aware of the way the stream will be used.
Please, advise if you know a better solution.

Aliaksandr

On Mon, Dec 5, 2011 at 4:48 AM, James Kosin <ja...@gmail.com> wrote:

> Hmm....
> Any reason why the evaluator needs a -lang parameter as being mandatory
> now?
>
> James
>
> On 12/1/2011 3:30 PM, Aliaksandr Autayeu (Commented) (JIRA) wrote:
> >     [
> https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161118#comment-13161118]
> >
> > Aliaksandr Autayeu commented on OPENNLP-402:
> > --------------------------------------------
> >
> > OK, I'll make another proposal for the class hierarchy in CLI package,
> taking into account current discussion. I'll see other tickets as well.
> >
> >> CLI tools and formats refactored
> >> --------------------------------
> >>
> >>                 Key: OPENNLP-402
> >>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
> >>             Project: OpenNLP
> >>          Issue Type: Improvement
> >>          Components: Command Line Interface, Formats
> >>    Affects Versions: tools-1.5.3-incubating
> >>            Reporter: Aliaksandr Autayeu
> >>            Assignee: Aliaksandr Autayeu
> >>              Labels: patch
> >>         Attachments: 0016-CLI-tools-and-formats-refactored.patch,
> 0017-added-direct-format-support-to-CLI-tools-and-formats.patch,
> open-nlp-cli-package.png
> >>
> >>
> >> Proposed patch refactors CLI tools and simplifies the code by
> introducing hierarchy and removing a lot of code duplication. It also
> introduces better error and help messages, including help for formats and
> listing available formats in various tools, which are now able to work with
> formats directly. This, in turn, eliminates the need to keep converted
> files on disk.
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>
>

Re: [jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by Jörn Kottmann <ko...@gmail.com>.
Looks like ObjectStreamFactory now has a getLang method, but shouldn't
the language be passed to the factory via the args parameter of the 
create method?

Jörn

On 12/5/11 4:48 AM, James Kosin wrote:
> Hmm....
> Any reason why the evaluator needs a -lang parameter as being mandatory now?
>
> James
>
> On 12/1/2011 3:30 PM, Aliaksandr Autayeu (Commented) (JIRA) wrote:
>>      [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161118#comment-13161118 ]
>>
>> Aliaksandr Autayeu commented on OPENNLP-402:
>> --------------------------------------------
>>
>> OK, I'll make another proposal for the class hierarchy in CLI package, taking into account current discussion. I'll see other tickets as well.
>>
>>> CLI tools and formats refactored
>>> --------------------------------
>>>
>>>                  Key: OPENNLP-402
>>>                  URL: https://issues.apache.org/jira/browse/OPENNLP-402
>>>              Project: OpenNLP
>>>           Issue Type: Improvement
>>>           Components: Command Line Interface, Formats
>>>     Affects Versions: tools-1.5.3-incubating
>>>             Reporter: Aliaksandr Autayeu
>>>             Assignee: Aliaksandr Autayeu
>>>               Labels: patch
>>>          Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>>>
>>>
>>> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira
>>
>>


Re: [jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by James Kosin <ja...@gmail.com>.
Hmm....
Any reason why the evaluator needs a -lang parameter as being mandatory now?

James

On 12/1/2011 3:30 PM, Aliaksandr Autayeu (Commented) (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161118#comment-13161118 ] 
>
> Aliaksandr Autayeu commented on OPENNLP-402:
> --------------------------------------------
>
> OK, I'll make another proposal for the class hierarchy in CLI package, taking into account current discussion. I'll see other tickets as well.
>                 
>> CLI tools and formats refactored
>> --------------------------------
>>
>>                 Key: OPENNLP-402
>>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>>             Project: OpenNLP
>>          Issue Type: Improvement
>>          Components: Command Line Interface, Formats
>>    Affects Versions: tools-1.5.3-incubating
>>            Reporter: Aliaksandr Autayeu
>>            Assignee: Aliaksandr Autayeu
>>              Labels: patch
>>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>>
>>
>> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>         


Re: [jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by James Kosin <ja...@gmail.com>.
Hmm....
Any reason why the evaluator needs a -lang parameter as being mandatory now?

James

On 12/1/2011 3:30 PM, Aliaksandr Autayeu (Commented) (JIRA) wrote:
>     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161118#comment-13161118 ] 
>
> Aliaksandr Autayeu commented on OPENNLP-402:
> --------------------------------------------
>
> OK, I'll make another proposal for the class hierarchy in CLI package, taking into account current discussion. I'll see other tickets as well.
>                 
>> CLI tools and formats refactored
>> --------------------------------
>>
>>                 Key: OPENNLP-402
>>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>>             Project: OpenNLP
>>          Issue Type: Improvement
>>          Components: Command Line Interface, Formats
>>    Affects Versions: tools-1.5.3-incubating
>>            Reporter: Aliaksandr Autayeu
>>            Assignee: Aliaksandr Autayeu
>>              Labels: patch
>>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>>
>>
>> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>         


[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161118#comment-13161118 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

OK, I'll make another proposal for the class hierarchy in CLI package, taking into account current discussion. I'll see other tickets as well.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190767#comment-13190767 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

I have improved class hierarchy and naming here. Now, the name follow the convention, interface and class hierarchy are parallel (aligned) and there is more logic in the package class structure. As a confirmation that this is the right way, some <Class> tricks were eliminated, which is a bonus :)

So, now:

CmdLineTool is the root interface, with two children: BasicCmdLineTool (for simple tools) and TypedCmdLineTool (for tools with formats and extra params like evaluators and trainers). There is a bunch of "basement" implementations, they all start with Abstract. AbstractCmdLineTool is the root one, with two children, AbstractBasicCmdLineTool (base class for simple tools) and AbstractTypedTool (for those with formats). AbstractTypedTool further specializes into AbstractTypedParamTool, which provides extra parameter support for its children, trainers and evaluators: AbstractTrainerTool and AbstractEvaluatorTool. And then there is the rest of the forest. 

See attached opennlp-cmdline-package-class-structure.png for more details.

I believe that after feedback on the last commit and testing, the issue now can be finally closed :) Guys, please, go ahead and give me some feedback. Joern, please, give it a look, you reviewed original refactoring, so might still remember the code. Thank you!
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156655#comment-13156655 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

I have tested it with POS tagger and Tokenizer (and with my own formats as well, to avoid conversion, but I didn't include the support of third-party formats in this patch). Here is the command for conllx, which I also used for testing:

opennlp POSTaggerTrainer -model pt.bin -format conllx -data portuguese_bosque_train.conll -encoding UTF-8 -lang pt

The idea is that first go the parameters which belong to the training itself, such as model, iterations, etc. Second, you choose the format of your data and specify all that relates to the data: location (might be a database...), encoding, languages, etc... Of course, this may vary from format to format and therefore I list the formats available and their help separately.

I was thinking about opennlp as a default format, but didn't find an elegant solution, which applies well to all tools. As far as I know, the changes to the default case is 
1) -format opennlp
2) putting format related parameters after the format (so they are passed in the factory)

One of the reasons why I did not go for full backward compatibility (although it can be done) and making opennlp format the default one is that I saw a trend toward separating parameters anyway: those iterations and cutoffs being grouped, support for the parameter file... So there are already groups of parameters which are logically different and even validated in different places, therefore having an explicit separator between them will, for examepl, make users aware of these groups and might ease understanding of possible parameters values and error messages.

====
One disadvantage of this feature is, that it is easy to make mistakes which cannot be noticed, 
e.g. encoding issues, problems in the format, etc. 
===
Can you elaborate exactly how this is a disadvantage of having direct format support? I agree that conversion might reveal something (see below), but that's the advantage of the conversion, not the disadvantage of having direct format support.

====
That may sound a bit strange, but some bugs in the UIMA 
training integration came only to my attention after I added a feature to dump the training data to a file in the opennlp format. 
====
Here I fully understand and agree with you, it is aligned with my own experience. Format conversion itself often reveals some bugs in the data, or in the parser, whether on the source or the target side. When possible, I even did a round-trip conversion for the datasets I have just to discover some special cases in the data.


===
Do you know a cli tool which has a similar problem? 
===
I'm sorry, I didn't get you here. Which problem do you refer here to?

===
We need to adjust the possible arguments based on the format the user is choosing.
===
Yes, that's true and the factory for the format checks the parameters, printing an error message. I incorporated here your propagation of validate....Loudly. Did I misunderstand something?

===
Just some idea, but couldn't we do it like this: 
bin/opennlp TokenNameFinderCrossValidator.bionlp2004 -data ... 
===
In general, implicit format choice removes the need of 1 extra argument. And although I like it and thought of it, the problem is that sometimes it is not possible to rename the files - they are on the CD-ROM, or read-only network share, or in the database which has a JDBC URL.

Oops, now I get you on the last point. I left the paragraph above, it might illustrate some of my thoughts.
===
bin/opennlp TokenNameFinderCrossValidator.bionlp2004 -data ...
===
Yes, adding the format to the tool name is an option. It'll change the parser a bit and will introduce two different types of parameters. But if the goal is to maximize the backwards compatibility (which was broken anyway with the renaming of -model-type to -type) then it might be considered.

Let me know if the patch needs elaboration.

Also, if you have ideas on how to ease the integration of the formats (checking, validation), to make it easier to work with them - I'm interested. But yes, in general there are converters for debugging or data screening.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aliaksandr Autayeu updated OPENNLP-402:
---------------------------------------

    Attachment: opennlp-cmdline-package-class-structure.png
    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aliaksandr Autayeu updated OPENNLP-402:
---------------------------------------

    Attachment: 0016-CLI-tools-and-formats-refactored.patch
    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159946#comment-13159946 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

I am proposing the following change to simplify these Abstract classes and interfaces.
We make one abstract CmdLineTool class which can get one parameter. The parameter could be used for anything, and must not always be a format.

abstract class CmdLineTool {
  void getName() {..} //default implementation which uses the class name to get the name
  String getShortDescription(){return "";}
  String[] supportedFormats(); // returns an empty array when only default format is supported
  String getHelp(String format){...} // returns default string with tool name, but zero parameters
  abstract void run(String format, String args[]); // format is null when not specified
}

Util methods from AbstractCLITool could be moved into a util class and imported via static import.

What do you think?

I committed a few things, in case you make changes, please update your branch.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156419#comment-13156419 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

How does this new format support work?

Lets say I am training the name finder, thats how I do it:
bin/opennlp TokenNameFinderCrossValidator -lang x -data x.train -featuregen xyz.xml -params xyz.txt

After applying your patch I get this error and help message:
Error: Missing mandatory parameter: -format
Usage: opennlp TokenNameFinderCrossValidator [-resources resourcesDir] [-type modelType] [-featuregen featuregenFile] [-params paramsFile] [-iterations num] [-cutoff num] [-misclassified true|false] [-folds num] [-detailedF true|false] -format formatName

Arguments description:
    -resources resourcesDir
        The resources directory
    -type modelType
        The type of the token name finder model
    -featuregen featuregenFile
        The feature generator descriptor file
    -params paramsFile
        Training parameters file.
    -iterations num
        specifies the number of training iterations. It is ignored if a parameters file is passed.
    -cutoff num
        specifies the min number of times a feature must be seen. It is ignored if a parameters file is passed.
    -misclassified true|false
        if true will print false negatives and false positives
    -folds num
        The number of folds. Default is 10
    -detailedF true|false
        if true will print detailed FMeasure results
    -format formatName
        the format of the data, for example conllx, defaults to opennlp. Format might have its own parameters.

opennlp format usage: -lang language -data sampleData [-encoding charsetName]

Arguments description:
    -lang language
        specifies the language which is being processed.
    -data sampleData
        the data to be used
    -encoding charsetName
        specifies the encoding which should be used for reading and writing text. If not specified the system default will be used.

bionlp2004 format usage: -types DNA,protein,cell_type,cell_line,RNA -lang language -data sampleData [-encoding charsetName]

Arguments description:
    -types DNA,protein,cell_type,cell_line,RNA
    -lang language
        specifies the language which is being processed.
    -data sampleData
        the data to be used
    -encoding charsetName
        specifies the encoding which should be used for reading and writing text. If not specified the system default will be used.

conll03 format usage: -lang en|de -types per,loc,org,misc -data sampleData [-encoding charsetName]

Arguments description:
    -lang en|de
    -types per,loc,org,misc
    -data sampleData
        the data to be used
    -encoding charsetName
        specifies the encoding which should be used for reading and writing text. If not specified the system default will be used.

conll02 format usage: -lang es|nl -types per,loc,org,misc -data sampleData [-encoding charsetName]

Arguments description:
    -lang es|nl
    -types per,loc,org,misc
    -data sampleData
        the data to be used
    -encoding charsetName
        specifies the encoding which should be used for reading and writing text. If not specified the system default will be used.

ad format usage: -lang language -data sampleData [-encoding charsetName]

-----------

Specifying -format didn't help, do I need to have certain order? And shouldn't it default to opennlp?

I don't mind having support for a format directly, but it should not make the default case difficult to use.

One disadvantage of this feature is, that it is easy to make mistakes which cannot be noticed,
e.g. encoding issues, problems in the format, etc. That may sound a bit strange, but some bugs in the UIMA
training integration came only to my attention after I added a feature to dump the training data to a file in the opennlp format.

Anyway, we still have the converters for debugging or the case that a user just wants to see the training data.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209451#comment-13209451 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

Sorry it took me a while. I think it is still kind of complicated, because there are so many abstract super classes and interfaces.

What do you think about replacing the CmdLineTool with AbstractCmdLineTool. We don't really need
the CmdLineTool as interface, it could be an abstract class.

Then we do the same with BasicCmdLineTool and TypedCmdLineTool.
And it would be good if we can make it more clear what the difference between the two is based on the name.
The javadoc comment for TypedCmdLineTool is clear, but the class name doesn't say anything about that.

What do you think?

                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aliaksandr Autayeu updated OPENNLP-402:
---------------------------------------

    Attachment: 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157453#comment-13157453 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

===
I don't get your point here. Why do we need to rename the files? I just need to provide a path to a file (or a directory).
===
I thought the format was to be specified as file extension. Overlooked that it pertains to the tool name.

... format is added to the tool name?
These are good reasons. Thank you for advice. Done.

- Remove author tags, code is owned by the community and not individuals, we list contributors and committers on the team page. 
Done.

- Don't rename classes which are part of our public API, breaks backward compatibility (NameSampleDataStream, WordTagSampleStream) 
Done. They did not follow the pattern, that's why I did that. Having class names follow the pattern might be worth breaking the compatibility.

- Should AbstractCLITool.getHelp return this: "Usage: " + CLI.CMD + " " + getName() ? 
Done with basicHelp

- Why was InvalidFormatException removed from some ModelLoader sub-classes? 
IOException is a more general one, already in throws: public class InvalidFormatException extends IOException {

- Documentation should be updated as well 
Done + JIRAs 215 and 218.

Some extra notes:

I have updated documentation a bit to resolve JIRAs 215 and 218 and have updated examples which used deprecated methods to use new ones. I probably should have done this in separate patches, but I've got carried away. Some parameters descriptions have been updated to make the help uniform and coherent. Some inappropriately used <programlisting> have been changed to <screen>.

In some places tool return codes have been updated to return 0 (no mistakes, OK) when help is printed, and -1 on mistakes (parameters or else)

In some places I have removed Javadocs where they duplicated those already written on interfaces.

To a TerminateToolException was added a convention on error codes and an attempt was made to convert some direct print() calls to centralize error processing.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aliaksandr Autayeu resolved OPENNLP-402.
----------------------------------------

       Resolution: Fixed
    Fix Version/s: tools-1.5.3-incubating
    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161105#comment-13161105 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

Anyway, can you please have a look at the other jiras you fixed and close them if they are done? I assigned them all to you.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aliaksandr Autayeu updated OPENNLP-402:
---------------------------------------

    Attachment: open-nlp-cli-package.png
    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161077#comment-13161077 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

To clarify the class renaming statement, we cannot rename classes which are part of our public API. All classes which are located inside the cmdline package are not public API.
So we can replaces interfaces with abstract classes, rename things, etc.

What do you think about my proposed CmdLineTool as the top class? AbstractCLITool could be a util class and everything could be imported via static imports.

I suggest that we commit now what is there and we then do additional changes to the AbstractXXX classes in a new patch.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203380#comment-13203380 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

Can this issue be closed?
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452992#comment-13452992 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

We should have a look at this issue again before we release 1.5.3.

My two main concerns are these:
- Complicated structure with all these abstract super classes and interfaces (could easily be changed after 1.5.3)
- The language parameter is mandatory in places where it was not used before, and does not have any meaning, e.g. while doing evaluation with sample date, there the language from the model is used.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156408#comment-13156408 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

I once tried 3 - 4 different strategies to add multi-threading support to the maxent package, doing this just in different branches on a platform like github would have been nice.

Next thing where I will use it for are hash features instead of our string features. I would like to make a branch which is public to get comments from others on that.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13159599#comment-13159599 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

Sorry, I am late with my review. Looks great. I got a bit confused with all these AbstractXXX classes and interfaces. We eventually should simplify that a bit. 

I will do a bit more testing before I commit it.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155882#comment-13155882 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

I create branch for every patch I submit. I keep them in the cloned (from Apache Git mirror) OpenNLP repo on my machine. Why should I share them? Does it matter if code is accepted only here, only through patches. May be I miss something? Can you elaborate or explain?
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155918#comment-13155918 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

For example, for which things? :) 

Anyway, if branch integration would make patch integration easier (oh, pull requests... :), I might do that :) Subversion branches are cheap to do, expensive to merge. Git merges better. Afterall, the only reason to have a branch is to integrate it later on (or discard). Anyway, I'm open to advice which will make your integration life easier.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161064#comment-13161064 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

About AbstractXXX classes. I have reviewed the hierarchy now, once again and see some space for improvement. I agree it should be refactored once more.

Current hierarchy of AbstractXXX classes is made to avoid code duplication. It is not very consistent in naming because I can't rename existing classes (I've just been asked to revert one such proposal). Now I have reviewed it again and see some possible improvements, but simplifying it to one class and interface might make it easier to understand, but will make other things messy.

For example, I like the idea making format support more explicit in the interface (supportedFormats, getHelp, etc above), but I don't like the idea of mixing everything into one place. Things like " The parameter could be used for anything, and must not always be a format. " is problematic, because it hides the semantics of the parameter. If format parameter is not a format anymore, it should be called differently, or made anonymous (e.g. args[]) otherwise it is misleading.

Current class hierarchy !open-nlp-cli-package.png! (attached diagram) separates the tools which does not need anything (CmdLineTool), the tools which work with samples of some kind and recognize formats (AbstactTypedTool) and tools which have parameters formalized (AbstractCLITool<P>). Then there are more specific tools: trainers, cross validators, evaluators, etc.

I thought a bit and spaces for improvement lie in adding some methods from your comment to interfaces, in improving this place: BaseCLITool extends AbstractCLITool<Class>, and hierarchy just a bit (AbstractCLITool<P> and CmdLineTool is kind of parallel).
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156703#comment-13156703 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

Here are my code review comments:
- Remove author tags, code is owned by the community and not individuals, we list contributors and committers on the team page.
- Don't rename classes which are part of our public API, breaks backward compatibility (NameSampleDataStream, WordTagSampleStream)
- Should AbstractCLITool.getHelp return this: "Usage: " + CLI.CMD + " " + getName() ?
- Why was InvalidFormatException removed from some ModelLoader sub-classes?
- Documentation should be updated as well

                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13157455#comment-13157455 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

I have attached an updated patch. Please, review and integrate.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203454#comment-13203454 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

I was actually waiting for you to give it a final approving look :)
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>             Fix For: tools-1.5.3-incubating
>
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann reassigned OPENNLP-402:
--------------------------------------

    Assignee: Joern Kottmann  (was: Aliaksandr Autayeu)
    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3
>            Reporter: Aliaksandr Autayeu
>            Assignee: Joern Kottmann
>              Labels: patch
>             Fix For: tools-1.5.3
>
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png, opennlp-cmdline-package-class-structure.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156423#comment-13156423 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

Do you know a cli tool which has a similar problem?
We need to adjust the possible arguments based on the format the user is choosing.

Just some idea, but couldn't we do it like this:
bin/opennlp TokenNameFinderCrossValidator.bionlp2004 -data ...

and the default is just like it is now:
bin/opennlp TokenNameFinderCrossValidator -data ...

Maybe the format separation with the dot looks a bit stupid,
I will continue to think about it.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13156677#comment-13156677 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

The iteration and cutoff parameter will be removed. They are only there for backward compatibility.

The notion of -param value does most often not rely on a certain parameter oder, and I believe having this is highly confusing to a user.

Should we implement the solution where to format is added to the tool name?
I think that has a couple of advantages
1. I only get help for parameters I need, and I can still see the help message while typing my command. That is not possible when the help message is a few screens long.
2. Parameter order does not matter
3. It works as it did before, default will be opennlp format

What do you think?

===
In general, implicit format choice removes the need of 1 extra argument. And although I like it and thought of it, the problem is that sometimes it is not possible to rename the files - they are on the CD-ROM, or read-only network share, or in the database which has a JDBC URL. 
===

I don't get your point here. Why do we need to rename the files? I just need to provide a path to a file (or a directory).
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Assigned) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joern Kottmann reassigned OPENNLP-402:
--------------------------------------

    Assignee: Aliaksandr Autayeu
    
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>            Assignee: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch, open-nlp-cli-package.png
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Aliaksandr Autayeu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13161049#comment-13161049 ] 

Aliaksandr Autayeu commented on OPENNLP-402:
--------------------------------------------

I agree about not mixing JIRAs, in fact, I wrote about it in the first place. Why didn't I follow what I preach? Because separating the changes into commits is can be difficult if you're in the middle of a refactoring. And separating them into 4-5 patches is tougher still, because of the submit-wait-update-integrate-format-new-patch loop.
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch, 0017-added-direct-format-support-to-CLI-tools-and-formats.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-402) CLI tools and formats refactored

Posted by "Joern Kottmann (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13155875#comment-13155875 ] 

Joern Kottmann commented on OPENNLP-402:
----------------------------------------

I am just curious. You said that you created a (git?) branch for this issue. Is it public, e.g on github or so?
                
> CLI tools and formats refactored
> --------------------------------
>
>                 Key: OPENNLP-402
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-402
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface, Formats
>    Affects Versions: tools-1.5.3-incubating
>            Reporter: Aliaksandr Autayeu
>              Labels: patch
>         Attachments: 0016-CLI-tools-and-formats-refactored.patch
>
>
> Proposed patch refactors CLI tools and simplifies the code by introducing hierarchy and removing a lot of code duplication. It also introduces better error and help messages, including help for formats and listing available formats in various tools, which are now able to work with formats directly. This, in turn, eliminates the need to keep converted files on disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira