You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by myui <gi...@git.apache.org> on 2018/08/23 06:28:06 UTC
[GitHub] incubator-hivemall pull request #155: [HIVEMALL-201-2] Evaluate, fix and doc...
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/155
[HIVEMALL-201-2] Evaluate, fix and document FFM
## What changes were proposed in this pull request?
Applied some refactoring to #149
This PR closes #149
## What type of PR is it?
Hot Fix, Refactoring
## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-201
## How was this patch tested?
unit tests, manual tests
## How to use this feature?
Will be published at: http://hivemall.incubator.apache.org/userguide/binaryclass/criteo_ffm.html
## Checklist
(Please remove this section if not needed; check `x` for YES, blank for NO)
- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit?
- [x] Did you run system tests on Hive (or Spark)?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/myui/incubator-hivemall HIVEMALL-201-2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-hivemall/pull/155.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #155
----
commit c4d6855d6286249e150e4c8dcd5413bcde339990
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-16T08:39:32Z
Use pre-defined constants in option description
commit f7e7e1d49e5fa2e4f4f50d55f85c5cdee3bb69b1
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-16T08:40:48Z
Fix mismatch between opts.addOption and cl.getOptionValue
commit 929781a982f86851e38d558bb79a239d90c90e76
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-16T08:41:34Z
Support FFM feature format in `l1_normalize` and `l2_normalize`
commit a1751361f8ae2204cdc6507514945ebaa1ddf179
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-21T06:02:14Z
Increase `alphaFTRL` in `testSampleEnableNorm` for convergence
commit ff049d776133d1bc0cf7e62d9740f22a3943f593
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-22T02:16:51Z
Fix typo
commit 35a02451fc4e8a55bbb49b7fede3c545145b7d6e
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-22T05:22:35Z
Fix bug in forward model
Due to typo, linear weights in model are not correctly forwarded.
commit 9782136e3059df1d334c814c9eb9455e1ec9b573
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-22T06:39:22Z
Fix order of computing AdaGrad learning rate
* Gradient includes regularization term
* Get sum of squared gradient after adding the latest gradient
See:
https://github.com/guestwalk/libffm/blob/7db5b4f1ad3af7eb5bd0c224b2fa5305e1a715d2/ffm.cpp#L219-L226
commit 2366d910581248249a4e69e1110675469a17ea99
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-22T06:47:03Z
Enable to specify initial learn rate for AdaGrad
commit f1fd20cd508a8473bd0fef037cd708d5c3379c5f
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-22T08:35:36Z
Make `-max_init_value` more meaningful
In fact, the code sampled random value from [0, max_init_value / k], but
users expect that each element in V is exactly initialized random values
in [0, max_init_value].
commit 478f26dab385b3835cdfbe19d40beef47336d92d
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-23T05:19:17Z
Add `-l2norm` option to FeaturePairsUDTF
Users can configure if feature vector is L2 normalized in a similar way
to `train_ffm`.
commit 3627ca84e857210aa921fd607fed19759d26fba0
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-23T06:27:02Z
Switch `-disable_wi` option to `-enable_wi`
commit e2c378f5134c67d25047169324c6aa9df62e8b8f
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-23T07:01:09Z
Fix test broken by change of default learn rate for FFM+AdaGrad
commit 056dfde30437c9bbcfca4444f292698ba97dfa67
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-23T07:27:34Z
FFM applies instance-wise L2 normalization by default
commit 91aed6ecdc5401d972eac534e54246c59fd15ebb
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-24T00:48:37Z
Increase default number of iterations to rely more on cv_test
commit dca7e5762d664039354d00da8c3ca9adccd5d7c2
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-24T04:23:24Z
Make default L2 regularization parameter smaller
New default value 0.0001 is same as FTRL and general
regressor/classifier.
0.01 was large on small data; a model cannot be successfully learnt in
some cases. By contrast, LIBFFM uses very small value 0.00002 by
default. This commit sets 0.0001, a middle of these values, as a
compromise.
commit f84c960285f04ada21fb346e94ed0b5683d31289
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-24T04:49:27Z
Increase default learn rate from 0.05 to 0.1
Referred the following implementations.
LIBFFM: 0.2 (with AdaGrad)
https://github.com/guestwalk/libffm/blob/740103e5eb920a4061dd8e977a2ede6d23c6910a/ffm.h#L31
libFM: 0.1
https://github.com/srendle/libfm/blob/4ba0e0d5646da5d00701d853d19fbbe9b236cfd7/src/libfm/libfm.cpp#L87
commit 5b9d36746d1bf432098a7a8ad02be3f5db1bef3e
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-24T05:06:22Z
Update FFM unit test cases
* Remove `runIterations` method and use `run` with appropriate `-iters` option
* Follow up previous change of default options
* Drop some options and confirm if their default values reasonaly work
commit 3a11ca096f1bd5287ef857f0781fa61a5e6efa4d
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-24T08:34:39Z
FFM UDTF does not override train() method in FM UDF
The only difference between them is in the type of model instance; FFM
checks `_ffmModel`, and FM refers `_model`.
Note that `adaptiveRegularization` is always false in FFM.
commit a48e8017339ba8b284fdeeb6bf21ee6ed2159983
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-24T08:44:17Z
Use consistent set of validation samples over iterations
Store if a sample is used for validation for the later iterations.
commit 38875b91287a821db5a8f3ea3c307576378ce485
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-25T06:05:04Z
Support `-early_stopping` option in FM/FFM by using validation samples
This implementation is still incomplete:
If validation loss is increased at n-th iteration, we should forward
previous model obtained at (n-1)-th iteration.
commit 3ca451c6bb7e1a74a0a309b1c3e4892f1717ba40
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-25T06:27:33Z
Fix typo: validatiState -> validationState
commit eb943d935d7b91054016de83273db9f86688d853
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-28T07:25:32Z
Enable to set W and V by directly pointing to feature index
commit 2875fe98b72bda17946ca0692aef3b8c4f9af86c
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-28T07:33:37Z
Enable to cache/restore FFM model parameters for early stopping
commit b670698c4d4e8188baeaffb13aac38ff55da0a03
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-28T08:15:25Z
Update early stopping option test case
This version of test cases checks if:
- early stopping works as expected
- early stopping holds correct "best" model parameters
commit 714298f608396bb3f817059827df9e27bc34591a
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-28T08:24:44Z
Fix missing `cacheCurrentModel` call
commit 700b40cb3829996a97b2caf831776e4fbaffdf51
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-28T08:29:51Z
Format code
commit cc7e1010ed91930e067cfa15d6726f455bcece8e
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-28T08:42:07Z
FFM fully ignores adaptive regularization option
commit 42f9b97352978f07ae0479c7a75862d490f937fc
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-29T05:22:05Z
Stop cache the best of the best model parameters
Caching previous model parameters consumes 2x memory. In order to avoid
consuming such crazy amount of memory, `-early_stopping` option forwards
a model obtained at (N+1)-th iteration as a compromise, when training is
stopped earlier at N-th iteration.
commit 9f6a761f4abe017a2fa17590e1f5e2d40fe6fcda
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-29T05:28:45Z
Make `_validationState` non-null for simplicity
commit b1fc49b8a86295d7c3b0fed284e118128450180c
Author: Takuya Kitazawa <k....@...>
Date: 2018-05-29T05:53:43Z
Stop iteration iff. loss is consecutively increased over 2 iters
"Immediately stop training once loss is increased" might be too
aggressive.
----
---
[GitHub] incubator-hivemall pull request #155: [HIVEMALL-201] Evaluate, fix and docum...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-hivemall/pull/155
---
[GitHub] incubator-hivemall issue #155: [HIVEMALL-201-2] Evaluate, fix and document F...
Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/155
@takuti will merge after EMR tests. FYI
---