You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Nicolas Hernandez (JIRA)" <ji...@apache.org> on 2013/05/14 09:41:15 UTC

[jira] [Created] (OPENNLP-578) Error writing model file due to a java writeUTF method problem

Nicolas Hernandez created OPENNLP-578:
-----------------------------------------

             Summary: Error writing model file due to a java writeUTF method problem
                 Key: OPENNLP-578
                 URL: https://issues.apache.org/jira/browse/OPENNLP-578
             Project: OpenNLP
          Issue Type: Bug
          Components: Maxent
    Affects Versions: maxent-3.0.3
         Environment: uname -a
Linux hebus 3.2.0-42-generic #67-Ubuntu SMP Mon May 6 21:33:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

java -version
java version "1.7.0_21"
OpenJDK Runtime Environment (IcedTea 2.3.9) (7u21-2.3.9-0ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

            Reporter: Nicolas Hernandez


Using the POSTaggerTrainer command line to build a model for predicting lemma 
(led by curiosity since the approach may not be the best one for this task),  
I ve got an error writing the model file due to a java writeUTF method problem [1]. 

More specifically, the problem is due to fact that java.io.DataOutputStream is not able to serialize strings larger
than 64KB.
[2] presents the problem and gives some workarounds. 
Solution to handle the problem may require to modify the binary format.
The class which seems concerned is 
opennlp-maxent/src/main/java/opennlp/maxent/io/BinaryGISModelWriter.java 

[1] Writing pos tagger model ... failed
Error during writing model file '/tmp/train-lemma.model'
encoded string too long: 153687 bytes
java.io.UTFDataFormatException: encoded string too long: 153687 bytes
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at opennlp.maxent.io.BinaryGISModelWriter.writeUTF(BinaryGISModelWriter.java:73)

[2] http://www.drillio.com/en/software-development/java/encoded-string-too-long-64kb-limit/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira