You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Nicolas Hernandez (JIRA)" <ji...@apache.org> on 2013/05/14 09:41:15 UTC
[jira] [Created] (OPENNLP-578) Error writing model file due to a
java writeUTF method problem
Nicolas Hernandez created OPENNLP-578:
-----------------------------------------
Summary: Error writing model file due to a java writeUTF method problem
Key: OPENNLP-578
URL: https://issues.apache.org/jira/browse/OPENNLP-578
Project: OpenNLP
Issue Type: Bug
Components: Maxent
Affects Versions: maxent-3.0.3
Environment: uname -a
Linux hebus 3.2.0-42-generic #67-Ubuntu SMP Mon May 6 21:33:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
java -version
java version "1.7.0_21"
OpenJDK Runtime Environment (IcedTea 2.3.9) (7u21-2.3.9-0ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
Reporter: Nicolas Hernandez
Using the POSTaggerTrainer command line to build a model for predicting lemma
(led by curiosity since the approach may not be the best one for this task),
I ve got an error writing the model file due to a java writeUTF method problem [1].
More specifically, the problem is due to fact that java.io.DataOutputStream is not able to serialize strings larger
than 64KB.
[2] presents the problem and gives some workarounds.
Solution to handle the problem may require to modify the binary format.
The class which seems concerned is
opennlp-maxent/src/main/java/opennlp/maxent/io/BinaryGISModelWriter.java
[1] Writing pos tagger model ... failed
Error during writing model file '/tmp/train-lemma.model'
encoded string too long: 153687 bytes
java.io.UTFDataFormatException: encoded string too long: 153687 bytes
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:364)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323)
at opennlp.maxent.io.BinaryGISModelWriter.writeUTF(BinaryGISModelWriter.java:73)
[2] http://www.drillio.com/en/software-development/java/encoded-string-too-long-64kb-limit/
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira