You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@madlib.apache.org by mktal <gi...@git.apache.org> on 2015/11/05 01:54:51 UTC
[GitHub] incubator-madlib pull request: Feature/svm grouping
GitHub user mktal opened a pull request:
https://github.com/apache/incubator-madlib/pull/1
Feature/svm grouping
Add grouping support for SVM regression
Issue: https://issues.apache.org/jira/browse/MADLIB-913
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mktal/incubator-madlib feature/svm_grouping
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/1.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1
----
commit 11be840a5c5ed12758924b49bb16ab0092cd8d35
Author: Xiaocheng Tang <xt...@pivotal.io>
Date: 2015-10-15T23:19:26Z
SVM: Add regression for SVM
commit bcc5b3430547e64c0bdeb411d85d3fc62c6a025e
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-10-20T23:28:22Z
SVM: Fix minor bugs
Install-check passed
commit 96a7a874232dca9b028fc05a80247bfe66a2553b
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-10-22T18:47:18Z
Refactoring GroupIterationController:
Add desp to unique_string()
Add _init_group_param() in GroupIterationController
commit f8f6d5ceba557e382ec6015b4ec299093f0f2fec
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-10-23T05:35:50Z
Refactoring GroupIterationController:
minor changes
commit a6a3043787d4c6aef7841adc544cfc1b1924139c
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-10-23T23:25:57Z
Refactoring GroupIterationController:
checkpoint
commit bc8e3c47f6a24a3f96b93ca017e444d332922c22
Author: Rahul Iyer <ri...@pivotal.io>
Date: 2015-10-26T22:09:39Z
Light cleanup in utilities
commit 00de2f5c45f7a1645bc64dc763e520dc83ba1a64
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-10-23T23:25:57Z
Refactoring GroupIterationController:
checkpoint
commit 7c7ca3b2ddfc0aa915f110505599760595749fa3
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-10-27T22:26:09Z
SVM: grouping for regression
Install checks added and passed
commit 12c16842a28702ac2c4357d384c53ba608a44f72
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-11-04T20:44:05Z
minor fixes
commit 70afde68bfe239fe744511f9905003eb58ab4695
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-11-04T22:06:41Z
minor fixes
commit 8a1d5ca1be20be4f12b16d01ad218aa832cabe86
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-11-04T23:12:24Z
minor fixes
commit 90683be0a89bb3d06b008279ce2d1254fae34e9a
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-11-05T00:16:29Z
minor fixes
commit c9f1582cfb1f912383226009072c106aad57afdb
Author: Xiaocheng Tang <xi...@gmail.com>
Date: 2015-11-05T00:35:22Z
minor fixes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request: Feature/svm grouping
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/1#discussion_r44072022
--- Diff: src/modules/convex/linear_svm_igd.cpp ---
@@ -95,19 +90,16 @@ linear_svm_igd_transition::run(AnyType &args) {
using madlib::dbal::eigen_integration::MappedColumnVector;
GLMTuple tuple;
tuple.indVar.rebind(args[1].getAs<MappedColumnVector>().memoryHandle(),
- state.task.dimension);
- tuple.depVar = args[2].getAs<bool>() ? 1. : -1.;
+ state.task.dimension);
+ tuple.depVar = args[2].getAs<double>();
// Now do the transition step
// apply IGD with regularization
- if (isL2) {
- L2<GLMModel>::scaling(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
- LinearSVMIGDAlgorithm::transition(state, tuple);
- } else {
- LinearSVMIGDAlgorithm::transition(state, tuple);
- L1<GLMModel>::clipping(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
- }
- // objective function and its gradient
+ L2<GLMModel>::scaling(state.task.model, state.algo.incrModel, state.task.stepsize);
--- End diff --
Why not check the isL2 variable and only perform the necessary computation?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request: Feature/svm grouping
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-madlib/pull/1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request: Feature/svm grouping
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/1#discussion_r44073931
--- Diff: src/ports/postgres/modules/utilities/in_mem_group_control.py_in ---
@@ -7,6 +7,141 @@
import plpy
from control import MinWarning
from utilities import unique_string
+from collections import namedtuple
+from collections import Iterable
+
+
+class BaseState(object):
+ """@brief Abstraction for intermediate iteration state"""
+ def __init__(self, **kwargs):
+ self._state = {}
+ self._is_none = None
+ self.initialize(**kwargs)
+
+ def __len__(self):
+ return len(self._state)
+
+ def __del__(self):
+ del self._state
+
+ def __getitem__(self, k):
+ return self._state[k]
+
+ def __setitem__(self, k, v):
+ self._state[k] = v
+
+ @property
+ def keys(self):
+ return self._state.keys()
+
+ @property
+ def values(self):
+ if self.is_none():
+ return []
+ return [s for x in self._state.values() for s in x]
+
+ def delete(self, keys_to_remove):
+ for k in keys_to_remove:
+ try:
+ del self._state[k]
+ except KeyError:
+ pass
+ self._is_none = None
+
+ def initialize(self,
+ col_grp_key='',
+ col_grp_state='',
+ ret_states=None, **kwargs):
+ self.update(col_grp_key, col_grp_state, ret_states)
+
+ def update(self,
+ col_grp_key,
+ col_grp_state,
+ ret_states):
+ failed_grp_keys = []
+ if ret_states is None:
+ return failed_grp_keys
+ t0 = ret_states[0]
+ # no key column in table ret_states
+ if col_grp_key not in t0:
+ return failed_grp_keys
+ # initialize state to None
+ if col_grp_state == '':
+ self._is_none = True
+ for s in ret_states:
+ self._state[s[col_grp_key]] = None
+ return failed_grp_keys
+ for t in ret_states:
+ _grp_key, _grp_state = t[col_grp_key], t[col_grp_state]
+ if _grp_state is None: failed_grp_keys.append(_grp_key)
--- End diff --
Let's put the statement in a separate line per [Pep 8](https://www.python.org/dev/peps/pep-0008/#other-recommendations)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request: Feature/svm grouping
Posted by mktal <gi...@git.apache.org>.
Github user mktal commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/1#discussion_r44073252
--- Diff: src/modules/convex/linear_svm_igd.cpp ---
@@ -95,19 +90,16 @@ linear_svm_igd_transition::run(AnyType &args) {
using madlib::dbal::eigen_integration::MappedColumnVector;
GLMTuple tuple;
tuple.indVar.rebind(args[1].getAs<MappedColumnVector>().memoryHandle(),
- state.task.dimension);
- tuple.depVar = args[2].getAs<bool>() ? 1. : -1.;
+ state.task.dimension);
+ tuple.depVar = args[2].getAs<double>();
// Now do the transition step
// apply IGD with regularization
- if (isL2) {
- L2<GLMModel>::scaling(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
- LinearSVMIGDAlgorithm::transition(state, tuple);
- } else {
- LinearSVMIGDAlgorithm::transition(state, tuple);
- L1<GLMModel>::clipping(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
- }
- // objective function and its gradient
+ L2<GLMModel>::scaling(state.task.model, state.algo.incrModel, state.task.stepsize);
--- End diff --
The L2 and L1 regularization are not exclusive. We can perform both of them by setting the corresponding regularization parameters to non-zero.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request: Feature/svm grouping
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/1#discussion_r44483491
--- Diff: CMakeLists.txt ---
@@ -103,6 +103,8 @@ if(CMAKE_COMPILER_IS_GNUCXX)
if(APPLE)
set(CMAKE_INCLUDE_SYSTEM_FLAG_CXX "-isystem ")
endif(APPLE)
+elseif(CMAKE_C_COMPILER_ID MATCHES "Clang")
--- End diff --
@mktal let's move this out as a separate JIRA+PR since it's not related to SVM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request: Feature/svm grouping
Posted by mktal <gi...@git.apache.org>.
Github user mktal commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/1#discussion_r44074166
--- Diff: src/modules/convex/linear_svm_igd.cpp ---
@@ -95,19 +90,16 @@ linear_svm_igd_transition::run(AnyType &args) {
using madlib::dbal::eigen_integration::MappedColumnVector;
GLMTuple tuple;
tuple.indVar.rebind(args[1].getAs<MappedColumnVector>().memoryHandle(),
- state.task.dimension);
- tuple.depVar = args[2].getAs<bool>() ? 1. : -1.;
+ state.task.dimension);
+ tuple.depVar = args[2].getAs<double>();
// Now do the transition step
// apply IGD with regularization
- if (isL2) {
- L2<GLMModel>::scaling(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
- LinearSVMIGDAlgorithm::transition(state, tuple);
- } else {
- LinearSVMIGDAlgorithm::transition(state, tuple);
- L1<GLMModel>::clipping(state.algo.incrModel, lambda, nTuples, state.task.stepsize);
- }
- // objective function and its gradient
+ L2<GLMModel>::scaling(state.task.model, state.algo.incrModel, state.task.stepsize);
--- End diff --
For now the users can only choose either l1 or l2. We can support both in the future if necessary
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-madlib pull request: Feature/svm grouping
Posted by iyerr3 <gi...@git.apache.org>.
Github user iyerr3 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/1#discussion_r44072383
--- Diff: src/modules/convex/linear_svm_igd.cpp ---
@@ -152,9 +144,18 @@ linear_svm_igd_final::run(AnyType &args) {
// Aggregates that haven't seen any data just return Null.
if (state.algo.numRows == 0) { return Null(); }
+
--- End diff --
Same as above - I would prefer to have one `if (isL2)` that decides between the L2 and L1 functions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---