You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by "Matt Post (JIRA)" <ji...@apache.org> on 2016/05/25 07:04:13 UTC
[jira] [Created] (JOSHUA-272) Simplify the packing and usage of
phrase-based grammars
Matt Post created JOSHUA-272:
--------------------------------
Summary: Simplify the packing and usage of phrase-based grammars
Key: JOSHUA-272
URL: https://issues.apache.org/jira/browse/JOSHUA-272
Project: Joshua
Issue Type: Improvement
Reporter: Matt Post
Assignee: Matt Post
Fix For: 6.1
For historical reasons, phrase-based grammars add some complexity to decoding. The complete tree under each top-level trie node in packed grammars has to fit within a single packed grammars slice, which is limited to 2 GB due to constraints on the size of Java byte[] arrays. We used to sort on just the first item in the trie, which was a problem for phrase-based decoding, since phrase-based rules are implemented as left-branching hierarchical rules. In order to pack large grammars, we packed them without the leading [X,1], and then added it when loading the grammars, both for the packed and memory-based grammars. This was a real mess.
This was all fixed with a commit a while ago that packs and reads packed grammars based on the first two symbols on the source side. So we should remove all the complexity associated with phrases. They should just be regular rules. There is also a lot of redundancy across the codebase in parsing rules, converting them to different formats, and so on.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)