You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hivemall.apache.org by takuti <gi...@git.apache.org> on 2017/05/15 09:10:11 UTC

[GitHub] incubator-hivemall pull request #79: [WIP][HIVEMALL-101] Separate optimizer ...

GitHub user takuti opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/79

    [WIP][HIVEMALL-101] Separate optimizer implementation

    ## What changes were proposed in this pull request?
    
    Finalize #14 
    
    ## What type of PR is it?
    
    Improvement, Feature
    
    ## What is the Jira issue?
    
    https://issues.apache.org/jira/browse/HIVEMALL-101
    
    ## How was this patch tested?
    
    Unit test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/takuti/incubator-hivemall HIVEMALL-101

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/79.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #79
    
----
commit b1535414146ce6711d8abc4892fd1d66b8dde342
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-05-02T14:43:42Z

    Add optimizer implementations

commit c2ca02aecf84cd678dc4f8729e1e01b1386826ea
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2016-09-20T16:52:22Z

    Revert some modifications

commit d36ea05a3fc22011c0932edd7b8b3c214b4bcf65
Author: myui <my...@apache.org>
Date:   2017-01-16T11:20:42Z

    Updated license headers

commit 06404280b05ded0d947070ec847136ab898f3966
Author: myui <my...@apache.org>
Date:   2017-01-16T11:35:00Z

    Fixed imports

commit 547cda4880269b28af4c60a869409d33599b748c
Author: myui <my...@apache.org>
Date:   2017-01-30T06:55:48Z

    Add annotations

commit 2a523a72b571a694004eaa3b07355a3710427954
Author: myui <my...@apache.org>
Date:   2017-01-30T08:50:21Z

    Refactored to support Optimizer

commit d14451cfeffd11b6a342d3a5eb878adbf812410f
Author: myui <my...@apache.org>
Date:   2017-02-08T05:44:57Z

    Applied refactoring

commit fa1e8e5678f93a85c8d5fcaae01c4b3f5cf81f88
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-05-11T08:09:48Z

    Fix build errors

commit 2d69bf5e64b374fe4d130da1c0726e0c661e1664
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-05-11T08:44:25Z

    Remove unused import

commit c73695a70aa66afa1fd4d04115e943cf6c1c0b32
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-05-15T08:39:23Z

    Fix OptimizerOptions
    
    * Order of short/long option names
    * Parsed option handling

commit 3e13b36c783f825590d0247c6c572cccfec4a9b2
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-05-15T08:48:27Z

    Make loss function configureable in generic classifier/regressor

commit 9b26a22719a6d187cb4352e270ef26812de3f128
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-05-15T08:49:21Z

    Add some messages to the LossFunction classes

commit 00af6a6ae66aeff32bfc275eef54a0f9f6c191d0
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-05-15T08:50:15Z

    Add generic classifier/regressor UDTF test

commit 791764c9f09c77835b383ad85eaf6131a72b7ac0
Author: Takuya Kitazawa <k....@gmail.com>
Date:   2017-05-15T09:07:39Z

    Wrap IllegalArgumentException in generic classifier/regressor UDTFs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11579905/badge)](https://coveralls.io/builds/11579905)
    
    Changes Unknown when pulling **c57d09ee89128f20406ad34482fa7d1a4c8ffc3f on takuti:HIVEMALL-101** into ** on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @takuti `-iter` support should be another ticket. `-minibatch` support can be within this ticket.
    
    Functional tests to confirm accuracy of `-loss logistic` to existing `logress` is required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    ^^^ since generic regressor does not accept classification loss (e.g. logloss) just like sklearn, I keep removing `checkTargetValue()` from the `GeneralRegression` class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11625112/badge)](https://coveralls.io/builds/11625112)
    
    Coverage increased (+0.7%) to 39.424% when pulling **c3b89f8a671a1ccf7a0c19e9f061d61c6e0c2807 on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #79: [WIP][HIVEMALL-101] Separate optimizer ...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/79#discussion_r116510491
  
    --- Diff: core/src/main/java/hivemall/regression/GeneralRegressionUDTF.java ---
    @@ -0,0 +1,131 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package hivemall.regression;
    +
    +import hivemall.annotations.Since;
    +import hivemall.model.FeatureValue;
    +import hivemall.optimizer.LossFunctions;
    +import hivemall.optimizer.LossFunctions.LossFunction;
    +import hivemall.optimizer.Optimizer;
    +import hivemall.optimizer.OptimizerOptions;
    +
    +import java.util.Map;
    +
    +import javax.annotation.Nonnull;
    +
    +import org.apache.commons.cli.CommandLine;
    +import org.apache.commons.cli.Options;
    +import org.apache.hadoop.hive.ql.exec.Description;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    +
    +/**
    + * A general regression class with replaceable optimization functions.
    + */
    +@Description(name = "train_regression",
    +        value = "_FUNC_(list<string|int|bigint> features, double label [, const string options])"
    +                + " - Returns a relation consists of <string|int|bigint feature, float weight>",
    +        extended = "Build a prediction model by a generic regressor")
    +@Since(version = "0.5-rc.1")
    +public final class GeneralRegressionUDTF extends RegressionBaseUDTF {
    +
    +    @Nonnull
    +    private final Map<String, String> optimizerOptions;
    +    private Optimizer optimizer;
    +    private LossFunction lossFunction;
    +
    +    public GeneralRegressionUDTF() {
    +        super(true); // This enables new model interfaces
    +        this.optimizerOptions = OptimizerOptions.create();
    +    }
    +
    +    @Override
    +    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
    +        if (argOIs.length != 2 && argOIs.length != 3) {
    +            throw new UDFArgumentException(this.getClass().getSimpleName()
    +                    + " takes 2 or 3 arguments: List<Text|Int|BitInt> features, float target "
    +                    + "[, constant string options]");
    +        }
    +
    +        StructObjectInspector outputOI = super.initialize(argOIs);
    +
    +        if (lossFunction.forBinaryClassification()) {
    +            throw new UDFArgumentException("The loss function `" + lossFunction + "` is not for regression");
    +        }
    +        if (is_mini_batch) {
    +            throw new UDFArgumentException("_FUNC_ does not currently support `-mini_batch` option");
    +        }
    +
    +        try {
    +            this.optimizer = createOptimizer(optimizerOptions);
    +        } catch (Throwable e) {
    +            throw new UDFArgumentException(e.getMessage());
    +        }
    +
    +        return outputOI;
    +    }
    +
    +    @Override
    +    protected Options getOptions() {
    +        Options opts = super.getOptions();
    +        opts.addOption("loss", "loss_function", true,
    +                "Loss function [default: SquaredLoss, QuantileLoss, EpsilonInsensitiveLoss]");
    +        OptimizerOptions.setup(opts);
    +        return opts;
    +    }
    +
    +    @Override
    +    protected CommandLine processOptions(ObjectInspector[] argOIs) throws UDFArgumentException {
    +        CommandLine cl = super.processOptions(argOIs);
    +        try {
    +            if (cl.hasOption("loss_function")) {
    +                this.lossFunction = LossFunctions.getLossFunction(cl.getOptionValue("loss_function"));
    +            } else {
    +                this.lossFunction = LossFunctions.getLossFunction("SquaredLoss");
    +            }
    +        } catch (Throwable e) {
    +            throw new UDFArgumentException(e.getMessage());
    +        }
    +        OptimizerOptions.propcessOptions(cl, optimizerOptions);
    +        return cl;
    +    }
    +
    +    @Override
    +    protected final void checkTargetValue(final float target) throws UDFArgumentException {
    --- End diff --
    
    @takuti Maybe for logistic regression that is actually a classifier taking 0/1 values. @maropu is not expert of machine learning algorithm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @maropu @takuti merged this so huge patch finally.. Thank you for your contribution!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #79: [WIP][HIVEMALL-101] Separate optimizer ...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/79#discussion_r116588229
  
    --- Diff: core/src/main/java/hivemall/regression/GeneralRegressionUDTF.java ---
    @@ -0,0 +1,131 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package hivemall.regression;
    +
    +import hivemall.annotations.Since;
    +import hivemall.model.FeatureValue;
    +import hivemall.optimizer.LossFunctions;
    +import hivemall.optimizer.LossFunctions.LossFunction;
    +import hivemall.optimizer.Optimizer;
    +import hivemall.optimizer.OptimizerOptions;
    +
    +import java.util.Map;
    +
    +import javax.annotation.Nonnull;
    +
    +import org.apache.commons.cli.CommandLine;
    +import org.apache.commons.cli.Options;
    +import org.apache.hadoop.hive.ql.exec.Description;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    +
    +/**
    + * A general regression class with replaceable optimization functions.
    + */
    +@Description(name = "train_regression",
    +        value = "_FUNC_(list<string|int|bigint> features, double label [, const string options])"
    +                + " - Returns a relation consists of <string|int|bigint feature, float weight>",
    +        extended = "Build a prediction model by a generic regressor")
    +@Since(version = "0.5-rc.1")
    +public final class GeneralRegressionUDTF extends RegressionBaseUDTF {
    +
    +    @Nonnull
    +    private final Map<String, String> optimizerOptions;
    +    private Optimizer optimizer;
    +    private LossFunction lossFunction;
    +
    +    public GeneralRegressionUDTF() {
    +        super(true); // This enables new model interfaces
    +        this.optimizerOptions = OptimizerOptions.create();
    +    }
    +
    +    @Override
    +    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
    +        if (argOIs.length != 2 && argOIs.length != 3) {
    +            throw new UDFArgumentException(this.getClass().getSimpleName()
    +                    + " takes 2 or 3 arguments: List<Text|Int|BitInt> features, float target "
    +                    + "[, constant string options]");
    +        }
    +
    +        StructObjectInspector outputOI = super.initialize(argOIs);
    +
    +        if (lossFunction.forBinaryClassification()) {
    +            throw new UDFArgumentException("The loss function `" + lossFunction + "` is not for regression");
    +        }
    +        if (is_mini_batch) {
    +            throw new UDFArgumentException("_FUNC_ does not currently support `-mini_batch` option");
    +        }
    +
    +        try {
    +            this.optimizer = createOptimizer(optimizerOptions);
    +        } catch (Throwable e) {
    +            throw new UDFArgumentException(e.getMessage());
    +        }
    +
    +        return outputOI;
    +    }
    +
    +    @Override
    +    protected Options getOptions() {
    +        Options opts = super.getOptions();
    +        opts.addOption("loss", "loss_function", true,
    +                "Loss function [default: SquaredLoss, QuantileLoss, EpsilonInsensitiveLoss]");
    +        OptimizerOptions.setup(opts);
    +        return opts;
    +    }
    +
    +    @Override
    +    protected CommandLine processOptions(ObjectInspector[] argOIs) throws UDFArgumentException {
    +        CommandLine cl = super.processOptions(argOIs);
    +        try {
    +            if (cl.hasOption("loss_function")) {
    +                this.lossFunction = LossFunctions.getLossFunction(cl.getOptionValue("loss_function"));
    +            } else {
    +                this.lossFunction = LossFunctions.getLossFunction("SquaredLoss");
    +            }
    +        } catch (Throwable e) {
    +            throw new UDFArgumentException(e.getMessage());
    +        }
    +        OptimizerOptions.propcessOptions(cl, optimizerOptions);
    +        return cl;
    +    }
    +
    +    @Override
    +    protected final void checkTargetValue(final float target) throws UDFArgumentException {
    --- End diff --
    
    @myui Ah, it makes sense since originally the generic regressor used `LossFunctions.logisticLoss(target, predicted);`. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11896899/badge)](https://coveralls.io/builds/11896899)
    
    Coverage increased (+0.8%) to 40.283% when pulling **5439bd80face5ef2f69650244ea8c9f0f13bed1b on takuti:HIVEMALL-101** into **1db5358767bb30a8c433e4530c39d8591bc28a36 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #79: [HIVEMALL-101] Separate optimizer imple...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-hivemall/pull/79


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11662166/badge)](https://coveralls.io/builds/11662166)
    
    Coverage increased (+0.5%) to 39.184% when pulling **689bdbf77c985117c2064d4a042d7d45f2971165 on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    I tested generic classifier and regressor on EMR by using the a9a data.
    
    ### Classifier
    
    ```
    set hivevar:n_samples=16281;
    set hivevar:total_steps=32562;
    ```
    
    #### `logress`
    
    ```sql
    drop table if exists logress_model;
    create table logress_model as
    select
     feature,
     avg(weight) as weight
    from
     (
      select
         logress(add_bias(features), label, '-total_steps ${total_steps}') as (feature, weight)
         -- logress(add_bias(features), label, '-total_steps ${total_steps} -mini_batch 10') as (feature, weight)
      from
         train_x3
     ) t
    group by feature;
    ```
    
    ```sql
    WITH test_exploded as (
      select
        rowid,
        label,
        extract_feature(feature) as feature,
        extract_weight(feature) as value
      from
        test LATERAL VIEW explode(add_bias(features)) t AS feature
    ),
    predict as (
      select
        t.rowid,
        sigmoid(sum(m.weight * t.value)) as prob,
        CAST((case when sigmoid(sum(m.weight * t.value)) >= 0.5 then 1.0 else 0.0 end) as FLOAT) as label
      from
        test_exploded t LEFT OUTER JOIN
        logress_model m ON (t.feature = m.feature)
      group by
        t.rowid
    ),
    submit as (
      select
        t.label as actual,
        pd.label as predicted,
        pd.prob as probability
      from
        test t JOIN predict pd
          on (t.rowid = pd.rowid)
    )
    select count(1) / ${n_samples} from submit
    where actual = predicted;
    ```
    
    #### `train_classifier`
    
    ```sql
    train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg no -eta simple -total_steps ${total_steps}') as (feature, weight)
    -- train_classifier(add_bias(features), label, '-loss logloss -opt SGD -reg no -eta simple -total_steps ${total_steps} -mini_batch 10') as (feature, weight)
    ```
    
    Results were completely same:
    
    | | online | mini-batch |
    |:--|:--:|:--:|
    |`logress`| 0.8414716540753026 | 0.848965051286776 |
    |`train_classifier`| 0.8414716540753026 | 0.848965051286776 |
    
    ### Regression
    
    Solved the a9a label prediction as a regression problem. 
    
    // Since non-generic Adagrad was designed for logistic loss (i.e. classification), we cannot compare it with generic regressor under the exactly same condition.
    
    #### `train_adagrad_regr` (internally uses logistic loss)
    
    ```sql
    drop table if exists adagrad_model;
    create table adagrad_model as
    select
     feature,
     avg(weight) as weight
    from
     (
      select
         train_adagrad_regr(features, label) as (feature, weight)
      from
         train_x3
     ) t
    group by feature;
    ```
    
    ```sql
    WITH test_exploded as (
      select
        rowid,
        label,
        extract_feature(feature) as feature,
        extract_weight(feature) as value
      from
        test LATERAL VIEW explode(add_bias(features)) t AS feature
    ),
    predict as (
      select
        t.rowid,
        sigmoid(sum(m.weight * t.value)) as prob
      from
        test_exploded t LEFT OUTER JOIN
        adagrad_model m ON (t.feature = m.feature)
      group by
        t.rowid
    ),
    submit as (
      select
        t.label as actual,
        pd.prob as probability
      from
        test t JOIN predict pd
          on (t.rowid = pd.rowid)
    )
    select rmse(probability, actual) from submit;
    ```
    
    ### `train_regression`
    
    ```sql
    train_regression(features, label, '-loss squaredloss -opt AdaGrad -reg no') as (feature, weight)
    -- train_regression(features, label, '-loss squaredloss -opt AdaGrad -reg no -mini_batch 10') as (feature, weight)
    ```
    
    | | online | mini-batch |
    |:--|:--:|:--:|
    |`train_adagrad_regr` (logistic loss) | 0.3254586866367811 | -- |
    |`train_regression` (squared loss) | 0.3356422627079689 | 0.3348889704327727 |
    
    As I mentioned in the last comment, I'm afraid whether the `-mini_batch` option works correctly for Adagrad. Fortunately, this example showed that the option slightly improved the accuracy of prediction in terms of RMSE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    I supported `-mini_batch` option for [regressor](https://github.com/takuti/incubator-hivemall/blob/c3b89f8a671a1ccf7a0c19e9f061d61c6e0c2807/core/src/main/java/hivemall/regression/GeneralRegressionUDTF.java#L121-L181) and [classifier](https://github.com/takuti/incubator-hivemall/blob/c3b89f8a671a1ccf7a0c19e9f061d61c6e0c2807/core/src/main/java/hivemall/classifier/GeneralClassifierUDTF.java#L122-L182) (same code). 
    
    The idea is just accumulating `new_weight` obtained from `optimizer.update()`. Once `miniBatchSize` samples are observed, a mean value of the accumulated `new_weight` will be set to a model via `model.setWeight`.
    
    For SGD, it's clearly equivalent to [what RegressorBaseUDTF does](https://github.com/takuti/incubator-hivemall/blob/5dc6f4eb5a8d8532201f6706673e2381d47d7e70/core/src/main/java/hivemall/regression/RegressionBaseUDTF.java#L247-L251). However, I'm a little bit afraid if I can do the same thing for Adagrad, Adam, Adadelta and AdagradRDA. (Currently, doing the same thing for Adagrad, Adam and Adadelta are allowed. By contrast, AdagradRDA + `-mini_batch` option is not supported.)
    
    BTW, practically, I observed that the naive Adagrad + `-mini_batch` implementation seems to work correctly as shown in the next comment:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @takuti better to have this kind of documents.
    http://spark.apache.org/docs/latest/mllib-optimization.html
    http://scikit-learn.org/stable/modules/sgd.html#mathematical-formulation
    
    BTW refer [1,2] for how Spark/scikit incorporates regularized updates. FYI
    [1] https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/optimization/Updater.scala
    [2] https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/sgd_fast.pyx#L632


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @takuti It's preferred to have an abstract class. Please create it.
    
    - hivemall.LearnerBase
      - hivemall.GeneralLeanerBase
         - hivemall.classifier.GeneralClassifier
         - hivemall.regression.GeneralRegression


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @takuti I guess no mix-server-related issues in this PR. Will review for that though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall pull request #79: [WIP][HIVEMALL-101] Separate optimizer ...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on a diff in the pull request:

    https://github.com/apache/incubator-hivemall/pull/79#discussion_r116445570
  
    --- Diff: core/src/main/java/hivemall/regression/GeneralRegressionUDTF.java ---
    @@ -0,0 +1,131 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package hivemall.regression;
    +
    +import hivemall.annotations.Since;
    +import hivemall.model.FeatureValue;
    +import hivemall.optimizer.LossFunctions;
    +import hivemall.optimizer.LossFunctions.LossFunction;
    +import hivemall.optimizer.Optimizer;
    +import hivemall.optimizer.OptimizerOptions;
    +
    +import java.util.Map;
    +
    +import javax.annotation.Nonnull;
    +
    +import org.apache.commons.cli.CommandLine;
    +import org.apache.commons.cli.Options;
    +import org.apache.hadoop.hive.ql.exec.Description;
    +import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    +
    +/**
    + * A general regression class with replaceable optimization functions.
    + */
    +@Description(name = "train_regression",
    +        value = "_FUNC_(list<string|int|bigint> features, double label [, const string options])"
    +                + " - Returns a relation consists of <string|int|bigint feature, float weight>",
    +        extended = "Build a prediction model by a generic regressor")
    +@Since(version = "0.5-rc.1")
    +public final class GeneralRegressionUDTF extends RegressionBaseUDTF {
    +
    +    @Nonnull
    +    private final Map<String, String> optimizerOptions;
    +    private Optimizer optimizer;
    +    private LossFunction lossFunction;
    +
    +    public GeneralRegressionUDTF() {
    +        super(true); // This enables new model interfaces
    +        this.optimizerOptions = OptimizerOptions.create();
    +    }
    +
    +    @Override
    +    public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
    +        if (argOIs.length != 2 && argOIs.length != 3) {
    +            throw new UDFArgumentException(this.getClass().getSimpleName()
    +                    + " takes 2 or 3 arguments: List<Text|Int|BitInt> features, float target "
    +                    + "[, constant string options]");
    +        }
    +
    +        StructObjectInspector outputOI = super.initialize(argOIs);
    +
    +        if (lossFunction.forBinaryClassification()) {
    +            throw new UDFArgumentException("The loss function `" + lossFunction + "` is not for regression");
    +        }
    +        if (is_mini_batch) {
    +            throw new UDFArgumentException("_FUNC_ does not currently support `-mini_batch` option");
    +        }
    +
    +        try {
    +            this.optimizer = createOptimizer(optimizerOptions);
    +        } catch (Throwable e) {
    +            throw new UDFArgumentException(e.getMessage());
    +        }
    +
    +        return outputOI;
    +    }
    +
    +    @Override
    +    protected Options getOptions() {
    +        Options opts = super.getOptions();
    +        opts.addOption("loss", "loss_function", true,
    +                "Loss function [default: SquaredLoss, QuantileLoss, EpsilonInsensitiveLoss]");
    +        OptimizerOptions.setup(opts);
    +        return opts;
    +    }
    +
    +    @Override
    +    protected CommandLine processOptions(ObjectInspector[] argOIs) throws UDFArgumentException {
    +        CommandLine cl = super.processOptions(argOIs);
    +        try {
    +            if (cl.hasOption("loss_function")) {
    +                this.lossFunction = LossFunctions.getLossFunction(cl.getOptionValue("loss_function"));
    +            } else {
    +                this.lossFunction = LossFunctions.getLossFunction("SquaredLoss");
    +            }
    +        } catch (Throwable e) {
    +            throw new UDFArgumentException(e.getMessage());
    +        }
    +        OptimizerOptions.propcessOptions(cl, optimizerOptions);
    +        return cl;
    +    }
    +
    +    @Override
    +    protected final void checkTargetValue(final float target) throws UDFArgumentException {
    --- End diff --
    
    @maropu This is a regressor which simply predicts real values. Why did you create this method? Values only in [0,1] are allowed...?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @myui Finished~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @takuti `train()` can return current loss and cumulative loss should be managed for future iteration support, e.g., using 
    https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/common/ConversionState.java


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @myui Almost done basically. Could you review when you get a chance?
    
    One thing I like to discuss here is that `GeneralClassifierUDTF` and `GeneralRegressionUDTF` currently has a lot of duplicated code. Thus, current class structure
    
    - Learner Base 
      - Binary Online Classifier 
        - General Classifier
      - Regression Base
        - General Regression
    
    can be modified to
    
    - Learner Base
      - General Predictor Base
        - General Classifier
        - General Regression
    
    for example. 
    
    If it sounds good for @myui, I will do so. Of course it's not mandatory, so keeping the current duplicated code is no problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11627493/badge)](https://coveralls.io/builds/11627493)
    
    Coverage increased (+0.7%) to 39.422% when pulling **f98bc73c89610f4b1a489c6b810752d843f9d7cc on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11661000/badge)](https://coveralls.io/builds/11661000)
    
    Coverage increased (+0.7%) to 39.422% when pulling **2724dbcc97218ae6237f5ff675027ad24f9501bb on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11594767/badge)](https://coveralls.io/builds/11594767)
    
    Coverage increased (+0.3%) to 38.968% when pulling **0f268943082be62e56c2acc86a02232b901081dd on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11624842/badge)](https://coveralls.io/builds/11624842)
    
    Coverage increased (+0.7%) to 39.438% when pulling **c3b89f8a671a1ccf7a0c19e9f061d61c6e0c2807 on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11596234/badge)](https://coveralls.io/builds/11596234)
    
    Coverage increased (+0.6%) to 39.251% when pulling **34cf8a1a7f2daa86fe3f9116a28b4497a74c2c3b on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [HIVEMALL-101] Separate optimizer implementati...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @takuti well done :+1: will review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11645966/badge)](https://coveralls.io/builds/11645966)
    
    Coverage increased (+1.07%) to 39.767% when pulling **0d573a0cbdb66d2b3d2cf49abd0ab61eb2bda76a on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by myui <gi...@git.apache.org>.
Github user myui commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    @takuti `checkTargetValue()` is need for loss function, e.g., for logistic loss.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11597582/badge)](https://coveralls.io/builds/11597582)
    
    Coverage increased (+0.4%) to 39.132% when pulling **5dc6f4eb5a8d8532201f6706673e2381d47d7e70 on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11627493/badge)](https://coveralls.io/builds/11627493)
    
    Coverage increased (+0.7%) to 39.422% when pulling **f98bc73c89610f4b1a489c6b810752d843f9d7cc on takuti:HIVEMALL-101** into **10e7d450fa8257efc5d614957fda514b2b91fdee on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by coveralls <gi...@git.apache.org>.
Github user coveralls commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    
    [![Coverage Status](https://coveralls.io/builds/11540991/badge)](https://coveralls.io/builds/11540991)
    
    Coverage increased (+0.6%) to 39.27% when pulling **2b965fc1d1ef01a704690b920b59f71dc4d6a3d5 on takuti:HIVEMALL-101** into **68f6b465248117d085a9cdb7b532837b14e054c5 on apache:master**.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    I listed TODOs in the top comment. If you have any other things I need to care, plz let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #79: [WIP][HIVEMALL-101] Separate optimizer impleme...

Posted by takuti <gi...@git.apache.org>.
Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/79
  
    Yep, that's why logistic loss is not selectable for now. `checkTargetValue()` will again come back later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---