You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hivemall.apache.org by helenahm <gi...@git.apache.org> on 2017/08/01 07:09:17 UTC

[GitHub] incubator-hivemall pull request #93: [WIP][HIVEMALL-126] Maximum Entropy Mod...

Github user helenahm commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/93#discussion_r130533347

--- Diff: core/pom.xml ---
@@ -103,6 +103,12 @@
<version>${guava.version}</version>
<scope>provided</scope>
</dependency>
+ <dependency>
+ <groupId>opennlp</groupId>
+ <artifactId>maxent</artifactId>
+ <version>3.0.0</version>
--- End diff --

In general I totally agree. I think it would be good to perform the move to another version of maxent in a few steps.

1. The code I have re-used is that of GISTrainer. That is more or less updating the weights in a matrix where matrix is hivemall's matrix. Everything else is just following your class structure. I have checked that the resulting models are the same and I have also confirmed that the resulting model makes sense on my own data. So the resulting weights must be correct. Can we say that training is correct and accept the current version as the correct and functioning one?

2. After that there are a few options:
we could try to re-write the code in a way that will accept the newest version of opennlp maxent and all the following versions. I guess that would require changes in opennlp maxent too, but perhaps it is better than manual alteration of GISTrainer every time you update something, and both projects will benefit from such collaboration.

if not, perhaps for Hivemall as a project, we may consider re-writing iterative scaling from scratch to make it Hivemall efficient, perhaps using the tricks OpenNLP uses to make the code more efficient, and making sure that the resulting weights are comparable, but without aiming to being able to plug a new OpenNLP jar each time new version appears.

What do you think?

Regards,
Elena.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---