You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2016/05/20 22:02:13 UTC
[jira] [Commented] (OPENNLP-830) Huge runtime improvement on
training (POS, Chunk, ...)
[ https://issues.apache.org/jira/browse/OPENNLP-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294340#comment-15294340 ]
Joern Kottmann commented on OPENNLP-830:
----------------------------------------
The change will not be backward compatible. Existing classes that extend AbstractModel will no longer compile / run if we do this change. I think this is really an edge case and the huge majority of users didn't implement a custom model.
I suggest we do the change for 1.6.1 and mention it in our release notes. Existing users can still use 1.6.0 until they migrate their code.
> Huge runtime improvement on training (POS, Chunk, ...)
> ------------------------------------------------------
>
> Key: OPENNLP-830
> URL: https://issues.apache.org/jira/browse/OPENNLP-830
> Project: OpenNLP
> Issue Type: Improvement
> Components: Machine Learning, POS Tagger
> Affects Versions: 1.6.0
> Environment: Any
> Reporter: Julien Subercaze
> Assignee: Joern Kottmann
> Labels: performance
> Fix For: 1.6.1
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> opennlp.tools.ml.model.IndexHashTable is custom-made Hashtable that is used to store mapping index. This Hashtable is heavily used in openlp.tools.ml.* (i.e. every model) and leads to disastrous performance.
> This hashtable is probably legacy some legacy and is highly inefficient. A simple drop-in replacement by a java.util.HashMap wrapper solves the issue, doesn't break compatibility and does not add any dependency.
> Training a pos-tagger on a large dataset with custom tags, I see a factor 5 improvement. It also seems to improve all ML models training pipeline.
> See : https://github.com/jsubercaze/opennlp/blob/trunk/opennlp-tools/src/main/java/opennlp/tools/ml/model/IndexHashTable.java
> For a quick fix.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)