You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "William Colen (JIRA)" <ji...@apache.org> on 2011/07/21 22:34:58 UTC

[jira] [Created] (OPENNLP-236) Create a command line tool to create dictionaries

Create a command line tool to create dictionaries
-------------------------------------------------

                 Key: OPENNLP-236
                 URL: https://issues.apache.org/jira/browse/OPENNLP-236
             Project: OpenNLP
          Issue Type: Improvement
          Components: Command Line Interface
    Affects Versions: tools-1.5.2-incubating
            Reporter: William Colen
            Assignee: William Colen
            Priority: Minor
             Fix For: tools-1.5.2-incubating


Should create a command line tool to create dictionaries. The input should be a plain text and the output a serialized Dictionary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-236) Create a command line tool to create dictionaries

Posted by "William Colen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069338#comment-13069338 ] 

William Colen commented on OPENNLP-236:
---------------------------------------

Hi James,

I created an initial version of the tool: DictionaryBuilderTool. It is simpler than Census tool because we can use Dictionary.parseOneEntryPerLine(in) directly on the input.
Maybe we should group tools that create dictionaries together like we do with converters? In this case we should pass an argument indicating the format, like "census" or "oneEntryPerLine", and internally it would route to the correct implementation.

> Create a command line tool to create dictionaries
> -------------------------------------------------
>
>                 Key: OPENNLP-236
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-236
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Should create a command line tool to create dictionaries. The input should be a plain text and the output a serialized Dictionary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-236) Create a command line tool to create dictionaries

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069448#comment-13069448 ] 

Jörn Kottmann commented on OPENNLP-236:
---------------------------------------

Hi,

these converter tool groupings are still not ideal and might need a bit more tweaking. I think for now the way William took is quite ok, we need to work on all this dictionary stuff anyway more in one of the coming releases, and we as well might have to support new kinds of dictionaries also.

Jörn 

> Create a command line tool to create dictionaries
> -------------------------------------------------
>
>                 Key: OPENNLP-236
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-236
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Should create a command line tool to create dictionaries. The input should be a plain text and the output a serialized Dictionary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-236) Create a command line tool to create dictionaries

Posted by "James Kosin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069349#comment-13069349 ] 

James Kosin commented on OPENNLP-236:
-------------------------------------

William,

Also, with the census data, it may end up being passed as multiple input files to create the dictionary.  Jorn at the time said it wasn't much to have the createDictionary() function in my class when I wanted to be able to pass the object stream directly to the Dictionary class.  And it allows some finer details to the CensusDictionaryCreator.
I think for the time being we can leave them separate and re-factor later if really needed later.  The object stream class is suppose to validate the input also before passing back the results; which is really nice.  We have actually used it quite well for the converters; but, it is flexible to use on simple standard input from files for things like sentences and such for training.

James

> Create a command line tool to create dictionaries
> -------------------------------------------------
>
>                 Key: OPENNLP-236
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-236
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Should create a command line tool to create dictionaries. The input should be a plain text and the output a serialized Dictionary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OPENNLP-236) Create a command line tool to create dictionaries

Posted by "William Colen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen closed OPENNLP-236.
---------------------------------

    Resolution: Fixed

> Create a command line tool to create dictionaries
> -------------------------------------------------
>
>                 Key: OPENNLP-236
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-236
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Should create a command line tool to create dictionaries. The input should be a plain text and the output a serialized Dictionary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-236) Create a command line tool to create dictionaries

Posted by "James Kosin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069333#comment-13069333 ] 

James Kosin commented on OPENNLP-236:
-------------------------------------

William,

I already have something similar to this with the CensusDictionaryCreatorTool.  Look in the opennlp.tools.cmdline.namefind...
Currently, it just creates a name dictionary from the census data.

James

> Create a command line tool to create dictionaries
> -------------------------------------------------
>
>                 Key: OPENNLP-236
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-236
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Should create a command line tool to create dictionaries. The input should be a plain text and the output a serialized Dictionary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-236) Create a command line tool to create dictionaries

Posted by "James Kosin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069346#comment-13069346 ] 

James Kosin commented on OPENNLP-236:
-------------------------------------

Hi,

Actually, if we could get the Dictionary to take an object stream like createDictionary() does in my class we could just create an object stream to get an entry per line (if not already available).
Look over the classes in the opennlp.tools.formats....

James

> Create a command line tool to create dictionaries
> -------------------------------------------------
>
>                 Key: OPENNLP-236
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-236
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Command Line Interface
>    Affects Versions: tools-1.5.2-incubating
>            Reporter: William Colen
>            Assignee: William Colen
>            Priority: Minor
>             Fix For: tools-1.5.2-incubating
>
>
> Should create a command line tool to create dictionaries. The input should be a plain text and the output a serialized Dictionary.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira