You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by "Matt Post (JIRA)" <ji...@apache.org> on 2016/09/07 18:09:20 UTC

[jira] [Updated] (JOSHUA-289) Fix output formatting

     [ https://issues.apache.org/jira/browse/JOSHUA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matt Post updated JOSHUA-289:
-----------------------------
    Fix Version/s:     (was: 6.1)
                   6.2

> Fix output formatting
> ---------------------
>
>                 Key: JOSHUA-289
>                 URL: https://issues.apache.org/jira/browse/JOSHUA-289
>             Project: Joshua
>          Issue Type: Improvement
>            Reporter: Matt Post
>            Assignee: Matt Post
>             Fix For: 6.2
>
>
> This is a sub ticket of JOSHUA-273.
> Joshua output formatting is a mess. The StructuredTranslation piece is a good step in the right direction, but many problems remain. Here is a list of problems and corrections.
> - There are currently four variables that contribute to defining separate paths for formatting the output: server mode (two different types) or regular mode, whether use_structured_translations is set, whether topN == 0 (i.e., whether we are outputting k-best or just quick viterbi best), and whether we are doing projecting case or doing denormalization of the output.
> - In TCP mode, ServerThread.java.run() iterates over Translation objects returned by Translations. Translation.toString() is then called. %S and recasing are applied.
> - In HTTP mode, ServerThread.java.handle() builds a JSONMessage, which in turn calls translation.getStructuredTranslations.get(0).getTranslationString(). No recasing or %S formatting are applied.
> - In regular mode, we call Translation.toString(), which formats output in a complicated way in the constructor, using different methods depending on whether (a) use_structured_translations is set (b) topN == 0. This is a veritable mess of nested redundant output formatting. Some of these in turn use separate formatting applied in KBestExtractor's constructor.
> Suggestions:
> - Get rid of topN==0. Viterbi extraction should be quicker than k-best and is used automatically if possible. The same output formatting should apply in either case.
> - We should always use structured outputs, even collapsing StructuredTranslation into Translation
> - Move all output formatting out of KBestExtractor. This should just return k-best items.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)