You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@madlib.apache.org by mktal <gi...@git.apache.org> on 2015/11/05 01:54:51 UTC

[GitHub] incubator-madlib pull request: Feature/svm grouping

GitHub user mktal opened a pull request:

    https://github.com/apache/incubator-madlib/pull/1

    Feature/svm grouping

    Add grouping support for SVM regression
    
    Issue: https://issues.apache.org/jira/browse/MADLIB-913

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mktal/incubator-madlib feature/svm_grouping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1
    
----
commit 11be840a5c5ed12758924b49bb16ab0092cd8d35
Author: Xiaocheng Tang <xt...@pivotal.io>
Date:   2015-10-15T23:19:26Z

    SVM: Add regression for SVM

commit bcc5b3430547e64c0bdeb411d85d3fc62c6a025e
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-10-20T23:28:22Z

    SVM: Fix minor bugs
    
    Install-check passed

commit 96a7a874232dca9b028fc05a80247bfe66a2553b
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-10-22T18:47:18Z

    Refactoring GroupIterationController:
    	Add desp to unique_string()
    	Add _init_group_param() in GroupIterationController

commit f8f6d5ceba557e382ec6015b4ec299093f0f2fec
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-10-23T05:35:50Z

    Refactoring GroupIterationController:
    	minor changes

commit a6a3043787d4c6aef7841adc544cfc1b1924139c
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-10-23T23:25:57Z

    Refactoring GroupIterationController:
    	checkpoint

commit bc8e3c47f6a24a3f96b93ca017e444d332922c22
Author: Rahul Iyer <ri...@pivotal.io>
Date:   2015-10-26T22:09:39Z

    Light cleanup in utilities

commit 00de2f5c45f7a1645bc64dc763e520dc83ba1a64
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-10-23T23:25:57Z

    Refactoring GroupIterationController:
    	checkpoint

commit 7c7ca3b2ddfc0aa915f110505599760595749fa3
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-10-27T22:26:09Z

    SVM: grouping for regression
    
    Install checks added and passed

commit 12c16842a28702ac2c4357d384c53ba608a44f72
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-11-04T20:44:05Z

    minor fixes

commit 70afde68bfe239fe744511f9905003eb58ab4695
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-11-04T22:06:41Z

    minor fixes

commit 8a1d5ca1be20be4f12b16d01ad218aa832cabe86
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-11-04T23:12:24Z

    minor fixes

commit 90683be0a89bb3d06b008279ce2d1254fae34e9a
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-11-05T00:16:29Z

    minor fixes

commit c9f1582cfb1f912383226009072c106aad57afdb
Author: Xiaocheng Tang <xi...@gmail.com>
Date:   2015-11-05T00:35:22Z

    minor fixes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request: Feature/svm grouping

Posted by iyerr3 <gi...@git.apache.org>.

Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/1#discussion_r44072022
  
    --- Diff: src/modules/convex/linear_svm_igd.cpp ---
    @@ -95,19 +90,16 @@ linear_svm_igd_transition::run(AnyType &args) {
         using madlib::dbal::eigen_integration::MappedColumnVector;
         GLMTuple tuple;
         tuple.indVar.rebind(args[1].getAs<MappedColumnVector>().memoryHandle(),
    -            state.task.dimension);
    -    tuple.depVar = args[2].getAs<bool>() ? 1. : -1.;
    +                        state.task.dimension);
    +    tuple.depVar = args[2].getAs<double>();
     
         // Now do the transition step
         // apply IGD with regularization
    -    if (isL2) {
    -        L2<GLMModel>::scaling(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
    -        LinearSVMIGDAlgorithm::transition(state, tuple);
    -    } else {
    -        LinearSVMIGDAlgorithm::transition(state, tuple);
    -        L1<GLMModel>::clipping(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
    -    }
    -    // objective function and its gradient
    +    L2<GLMModel>::scaling(state.task.model, state.algo.incrModel, state.task.stepsize);
    --- End diff --
    
    Why not check the isL2 variable and only perform the necessary computation? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request: Feature/svm grouping

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-madlib/pull/1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request: Feature/svm grouping

Posted by iyerr3 <gi...@git.apache.org>.

Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/1#discussion_r44073931
  
    --- Diff: src/ports/postgres/modules/utilities/in_mem_group_control.py_in ---
    @@ -7,6 +7,141 @@
     import plpy
     from control import MinWarning
     from utilities import unique_string
    +from collections import namedtuple
    +from collections import Iterable
    +
    +
    +class BaseState(object):
    +    """@brief Abstraction for intermediate iteration state"""
    +    def __init__(self, **kwargs):
    +        self._state = {}
    +        self._is_none = None
    +        self.initialize(**kwargs)
    +
    +    def __len__(self):
    +        return len(self._state)
    +
    +    def __del__(self):
    +        del self._state
    +
    +    def __getitem__(self, k):
    +        return self._state[k]
    +
    +    def __setitem__(self, k, v):
    +        self._state[k] = v
    +
    +    @property
    +    def keys(self):
    +        return self._state.keys()
    +
    +    @property
    +    def values(self):
    +        if self.is_none():
    +            return []
    +        return [s for x in self._state.values() for s in x]
    +
    +    def delete(self, keys_to_remove):
    +        for k in keys_to_remove:
    +            try:
    +                del self._state[k]
    +            except KeyError:
    +                pass
    +        self._is_none = None
    +
    +    def initialize(self, 
    +                   col_grp_key='', 
    +                   col_grp_state='', 
    +                   ret_states=None, **kwargs):
    +        self.update(col_grp_key, col_grp_state, ret_states)
    +
    +    def update(self, 
    +               col_grp_key, 
    +               col_grp_state, 
    +               ret_states):
    +        failed_grp_keys = []
    +        if ret_states is None:
    +            return failed_grp_keys
    +        t0 = ret_states[0]
    +        # no key column in table ret_states
    +        if col_grp_key not in t0:
    +            return failed_grp_keys
    +        # initialize state to None
    +        if col_grp_state == '':
    +            self._is_none = True
    +            for s in ret_states:
    +                self._state[s[col_grp_key]] = None
    +            return failed_grp_keys
    +        for t in ret_states:
    +            _grp_key, _grp_state = t[col_grp_key], t[col_grp_state]
    +            if _grp_state is None: failed_grp_keys.append(_grp_key)
    --- End diff --
    
    Let's put the statement in a separate line per [Pep 8](https://www.python.org/dev/peps/pep-0008/#other-recommendations)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request: Feature/svm grouping

Posted by mktal <gi...@git.apache.org>.

Github user mktal commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/1#discussion_r44073252
  
    --- Diff: src/modules/convex/linear_svm_igd.cpp ---
    @@ -95,19 +90,16 @@ linear_svm_igd_transition::run(AnyType &args) {
         using madlib::dbal::eigen_integration::MappedColumnVector;
         GLMTuple tuple;
         tuple.indVar.rebind(args[1].getAs<MappedColumnVector>().memoryHandle(),
    -            state.task.dimension);
    -    tuple.depVar = args[2].getAs<bool>() ? 1. : -1.;
    +                        state.task.dimension);
    +    tuple.depVar = args[2].getAs<double>();
     
         // Now do the transition step
         // apply IGD with regularization
    -    if (isL2) {
    -        L2<GLMModel>::scaling(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
    -        LinearSVMIGDAlgorithm::transition(state, tuple);
    -    } else {
    -        LinearSVMIGDAlgorithm::transition(state, tuple);
    -        L1<GLMModel>::clipping(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
    -    }
    -    // objective function and its gradient
    +    L2<GLMModel>::scaling(state.task.model, state.algo.incrModel, state.task.stepsize);
    --- End diff --
    
    The L2 and L1 regularization are not exclusive. We can perform both of them by setting the corresponding regularization parameters to non-zero.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request: Feature/svm grouping

Posted by iyerr3 <gi...@git.apache.org>.

Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/1#discussion_r44483491
  
    --- Diff: CMakeLists.txt ---
    @@ -103,6 +103,8 @@ if(CMAKE_COMPILER_IS_GNUCXX)
         if(APPLE)
             set(CMAKE_INCLUDE_SYSTEM_FLAG_CXX "-isystem ")
         endif(APPLE)
    +elseif(CMAKE_C_COMPILER_ID MATCHES "Clang")
    --- End diff --
    
    @mktal let's move this out as a separate JIRA+PR since it's not related to SVM. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request: Feature/svm grouping

Posted by mktal <gi...@git.apache.org>.

Github user mktal commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/1#discussion_r44074166
  
    --- Diff: src/modules/convex/linear_svm_igd.cpp ---
    @@ -95,19 +90,16 @@ linear_svm_igd_transition::run(AnyType &args) {
         using madlib::dbal::eigen_integration::MappedColumnVector;
         GLMTuple tuple;
         tuple.indVar.rebind(args[1].getAs<MappedColumnVector>().memoryHandle(),
    -            state.task.dimension);
    -    tuple.depVar = args[2].getAs<bool>() ? 1. : -1.;
    +                        state.task.dimension);
    +    tuple.depVar = args[2].getAs<double>();
     
         // Now do the transition step
         // apply IGD with regularization
    -    if (isL2) {
    -        L2<GLMModel>::scaling(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
    -        LinearSVMIGDAlgorithm::transition(state, tuple);
    -    } else {
    -        LinearSVMIGDAlgorithm::transition(state, tuple);
    -        L1<GLMModel>::clipping(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
    -    }
    -    // objective function and its gradient
    +    L2<GLMModel>::scaling(state.task.model, state.algo.incrModel, state.task.stepsize);
    --- End diff --
    
    For now the users can only choose either l1 or l2. We can support both in the future if necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-madlib pull request: Feature/svm grouping

Posted by iyerr3 <gi...@git.apache.org>.

Github user iyerr3 commented on a diff in the pull request:

    https://github.com/apache/incubator-madlib/pull/1#discussion_r44072383
  
    --- Diff: src/modules/convex/linear_svm_igd.cpp ---
    @@ -152,9 +144,18 @@ linear_svm_igd_final::run(AnyType &args) {
     
         // Aggregates that haven't seen any data just return Null.
         if (state.algo.numRows == 0) { return Null(); }
    +    
    --- End diff --
    
    Same as above - I would prefer to have one `if (isL2)` that decides between the L2 and L1 functions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---