You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hivemall.apache.org by myui <gi...@git.apache.org> on 2017/07/05 02:37:08 UTC

[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model

Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/93

• Shall I commit all the changes? Or would it better to pinpoint my own files only?

Just push your files by `git add <select only your files>` .

• Could you rename the title of this PR to `[WIP][HVIEMALL-126] Maximum Entropy Model using OpenNLP` to clarify this PR more?

• As commented in the review, why you are using multiple threads for training and OOB tests? Single thread execution is enough for MaxEntropy classifier and your OOB test scheme is incorrect.

• I have a little concern about memory usage of OpenNLP MaxEnt implementation. Not like other online classifier implementation, it holds entire training data in memory. Hivemall's RandomForest implementation holds entire dataset in memory but tried to consume less memory using CSRMatrix.

Max Entropy classifier is also known as Multinominal Logistic Regression. It might be better to evaluate [Smile's one](https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/classification/LogisticRegression.java#L264) as well. I need to evaluate memory consumptions of your implementation and OpenNLP maxent because each task of Hadoop workers can use limited relatively small memory space.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---