You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@madlib.apache.org by ri...@apache.org on 2016/07/06 20:30:14 UTC
[3/4] incubator-madlib git commit: SVM: Novelty detection using 1-class SVM

SVM: Novelty detection using 1-class SVM

Jira: MADLIB-990

Additional author: Nandish Jayaram <nj...@pivotal.io>

In this implementation of a one-class SVM, we are piggy-backing on the existing
SVM classification. The input table to a one-class SVM does not require a
dependent variable. A maximum-margin classifier is learned that separates all
the data from the origin. The default kernel for one-class is Gaussian (rbf).

Closes #48


Project: http://git-wip-us.apache.org/repos/asf/incubator-madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-madlib/commit/b7484c1f
Tree: http://git-wip-us.apache.org/repos/asf/incubator-madlib/tree/b7484c1f
Diff: http://git-wip-us.apache.org/repos/asf/incubator-madlib/diff/b7484c1f

Branch: refs/heads/master
Commit: b7484c1f73fd962c1b4b725bfced6ca88b19d21f
Parents: c4f7717
Author: Rahul Iyer <ri...@apache.org>
Authored: Wed Jul 6 13:26:08 2016 -0700
Committer: Rahul Iyer <ri...@apache.org>
Committed: Wed Jul 6 13:26:08 2016 -0700

----------------------------------------------------------------------
 doc/design/modules/SVM.tex                      |  14 +-
 doc/etc/madlib_extra.css                        |   2 +-
 doc/literature.bib                              |  10 +
 src/config/Version.yml                          |   2 +-
 src/modules/convex/algo/igd.hpp                 |   4 +-
 src/modules/convex/linear_svm_igd.cpp           |  15 +-
 src/modules/convex/type/tuple.hpp               |   5 +-
 .../modules/svm/kernel_approximation.py_in      | 313 +++++---
 src/ports/postgres/modules/svm/svm.py_in        | 764 ++++++++++++++-----
 src/ports/postgres/modules/svm/svm.sql_in       | 402 ++++++++--
 src/ports/postgres/modules/svm/test/svm.sql_in  | 205 +++--
 .../utilities/in_mem_group_control.py_in        |  17 +-
 .../postgres/modules/utilities/utilities.py_in  |   5 +-
 .../validation/internal/cross_validation.py_in  |  94 +--
 .../validation/test/cross_validation.sql_in     |   4 -
 15 files changed, 1288 insertions(+), 568 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/doc/design/modules/SVM.tex
----------------------------------------------------------------------
diff --git a/doc/design/modules/SVM.tex b/doc/design/modules/SVM.tex
index 73e97aa..c670694 100644
--- a/doc/design/modules/SVM.tex
+++ b/doc/design/modules/SVM.tex
@@ -102,7 +102,7 @@ See \cite{ShSS07} (extended version) for details.
 
 \subsection{$\epsilon$-Regression}
 
-SVM can also be used to predict the values of an affine function $f(x) = \langle w, x\rangle+b$, given sample input-output pairs $(x_1,y_1),\ldots,(x_n,y_n)$. If we allow ourselves an error bound of $\epsilon>0$, and some error controlled by the slack variables $\xi^*$, it is a matter simply modifying of the above convex problem. By demanding that our function is relatively ``flat," and that it approximates the true $f$ reasonably, the relevant optimization problem is
+SVM can also be used to predict the values of an affine function $f(x) = \langle w, x\rangle+b$, given sample input-output pairs $(x_1,y_1),\ldots,(x_n,y_n)$. If we allow ourselves an error bound of $\epsilon>0$, and some error controlled by the slack variables $\xi^*$, it is a matter of simply modifying the above convex problem. By demanding that our function is relatively ``flat," and that it approximates the true $f$ reasonably, the relevant optimization problem is:
 
 \begin{align*}
 \underset{w,\vec{\xi},\vec{\xi^*_i},b}{\text{Minimize }} & \frac{1}{2}||w||^2 + \frac{C}{n}\sum_{i=1}^n \xi_i + \xi^*_i \\
@@ -370,13 +370,11 @@ In the above algorithm, Step \ref{alg:x-omega} is done by broadcasting the matri
 The Nystr{\"o}m method approximates the kernel matrix $K=(k(x_i,x_j)_{i,j=1\ldots,N})$ by randomly sampling $m \ll N$ training data points  $\hat{x}_1,\ldots, \hat{x}_m$ and constructing a new, low rank matrix from the data. One constructs a low-dimensional feature representation of the form $x\mapsto A(k(x,\hat{x}_1),\ldots, k(x,\hat{x}_m))^{\textbf{T}}$, where $A$ is some matrix constructed from the eigenvectors and eigenvalues of the submatrix of the Gram matrix determined by $\hat{x}_1,\ldots, \hat{x}_m$. The computational complexity of constructing this predictor is $O(m^2n)$, which is much less than the cost of computing the full Gram matrix.
 
 
-
-
-
-
-
-
-
+\section{Novelty Detection}
+Suppose we have training data $x_1, x_2, \ldots x_n \in \R^d$, the goal of novelty detection is to learn a hyperplane in $\R^d$ that separates the training data from the origin with maximum margin. We model this problem as a 
+one-class classification problem by transforming the training data to $(x_1,y_1),\ldots,(x_n,y_n) \in \R^d\times \{1\}$, indicating that the dependent variable of each training instance is assumed to be the same.
+Given such a mapping, we use the SVM classification mechanisms detailed in Sections~\ref{sec:linear} and~\ref{sec:nonlinear} to learn a one-class classification model. See the paper by Sch\"{o}lkopf for more details on one-class 
+SVM~\cite{Scholkopf}.
 
 
 

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/doc/etc/madlib_extra.css
----------------------------------------------------------------------
diff --git a/doc/etc/madlib_extra.css b/doc/etc/madlib_extra.css
index 2b88a35..bbc884d 100644
--- a/doc/etc/madlib_extra.css
+++ b/doc/etc/madlib_extra.css
@@ -99,7 +99,7 @@ td.paramname {
 
 /* Style parameter lists formatted with definition lists. */
 dl.arglist {
-  margin-left: 20px;
+  margin-left: 40px;
   margin-top: 0px;
 }
 

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/doc/literature.bib
----------------------------------------------------------------------
diff --git a/doc/literature.bib b/doc/literature.bib
index d30023e..0aee99a 100644
--- a/doc/literature.bib
+++ b/doc/literature.bib
@@ -869,4 +869,14 @@ Applied Survival Analysis},
 	journal={Proceedings of the 24th International Conference on Machine Learning},
 	year={2007}
 	}
+
+@article{Scholkopf,
+ author = {Sch\"{o}lkopf, Bernhard and Platt, John C. and Shawe-Taylor, John C. and Smola, Alex J. and Williamson, Robert C.},
+ title = {Estimating the Support of a High-Dimensional Distribution},
+ journal = {Neural Computation},
+ volume = {13},
+ number = {7},
+ year = {2001},
+ pages = {1443--1471},
+} 
 	

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/src/config/Version.yml
----------------------------------------------------------------------
diff --git a/src/config/Version.yml b/src/config/Version.yml
index 2a745f2..cea24a3 100644
--- a/src/config/Version.yml
+++ b/src/config/Version.yml
@@ -1 +1 @@
-version: 1.9dev
+version: 1.9.1dev

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/src/modules/convex/algo/igd.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/algo/igd.hpp b/src/modules/convex/algo/igd.hpp
index e933702..cd17e64 100644
--- a/src/modules/convex/algo/igd.hpp
+++ b/src/modules/convex/algo/igd.hpp
@@ -21,7 +21,7 @@ namespace convex {
 
 // use Eigen
 using namespace madlib::dbal::eigen_integration;
-    
+
 // The reason for using ConstState instead of const State to reduce the
 // template type list: flexibility to high-level for mutability control
 // More: cast<ConstState>(MutableState) may not always work
@@ -53,7 +53,7 @@ IGD<State, ConstState, Task>::transition(state_type &state,
             state.algo.incrModel,
             tuple.indVar,
             tuple.depVar,
-            state.task.stepsize);
+            state.task.stepsize * tuple.weight);
 }
 
 template <class State, class ConstState, class Task>

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/src/modules/convex/linear_svm_igd.cpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/linear_svm_igd.cpp b/src/modules/convex/linear_svm_igd.cpp
index 65c0600..f396250 100644
--- a/src/modules/convex/linear_svm_igd.cpp
+++ b/src/modules/convex/linear_svm_igd.cpp
@@ -92,19 +92,18 @@ linear_svm_igd_transition::run(AnyType &args) {
 
     // tuple
     using madlib::dbal::eigen_integration::MappedColumnVector;
-    GLMTuple tuple;
 
     // each tuple can be weighted - this can be combination of the sample weight
     // and the class weight. Calling function is responsible for combining the two
-    // into a single tuple weight. The default value for this parameter should be 1.
-    const double tuple_weight = args[11].getAs<double>();
-
+    // into a single tuple weight. The default value for this parameter is 1, set
+    // into the definition of "tuple".
+    // The weight is used to increase the value of a particular tuple for the online
+    // learning. The weight is not used for the loss computation.
+    GLMTuple tuple;
     tuple.indVar.rebind(args[1].getAs<MappedColumnVector>().memoryHandle(),
                         state.task.dimension);
-
-    // tuple weight is multiplied to the gradient update. That is equivalent to
-    // multiplying with the dependent variable
-    tuple.depVar = args[2].getAs<double>() * tuple_weight;
+    tuple.depVar = args[2].getAs<double>();
+    tuple.weight = args[11].getAs<double>();
 
     // Now do the transition step
     // apply IGD with regularization

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/src/modules/convex/type/tuple.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/type/tuple.hpp b/src/modules/convex/type/tuple.hpp
index 7901f2e..7ddb1b7 100644
--- a/src/modules/convex/type/tuple.hpp
+++ b/src/modules/convex/type/tuple.hpp
@@ -31,18 +31,21 @@ struct ExampleTuple {
     int id;
     independent_variables_type indVar;
     dependent_variable_type depVar;
+    double weight;
 
-    ExampleTuple() { id = 0; }
+    ExampleTuple() { id = 0; weight = 1;}
     ExampleTuple(const ExampleTuple &rhs) {
         id = rhs.id;
         indVar = rhs.indVar;
         depVar = rhs.depVar;
+        weight = rhs.weight;
     }
     ExampleTuple& operator=(const ExampleTuple &rhs) {
         if (this != &rhs) {
             id = rhs.id;
             indVar = rhs.indVar;
             depVar = rhs.depVar;
+            weight = rhs.weight;
         }
         return *this;
     }

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/src/ports/postgres/modules/svm/kernel_approximation.py_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/svm/kernel_approximation.py_in b/src/ports/postgres/modules/svm/kernel_approximation.py_in
index 0a09fbf..5fa449b 100644
--- a/src/ports/postgres/modules/svm/kernel_approximation.py_in
+++ b/src/ports/postgres/modules/svm/kernel_approximation.py_in
@@ -19,37 +19,117 @@ from __future__ import division
 
 import plpy
 
-from utilities.utilities import unique_string
-from utilities.utilities import extract_keyvalue_params
-from utilities.utilities import num_features
+from utilities.utilities import unique_string, num_features
 
-from math import sqrt
-from math import pi
-from math import log
-from math import factorial
+import collections
+import functools
+from math import sqrt, pi, log, factorial
+import operator
+from random import random, seed
 
-from random import random
-from random import seed
 
-from operator import mul
-from collections import namedtuple
+PolyRandOperator = collections.namedtuple(
+    'PolyRandOperator', 'weights, coefs, reps, other_features, rd_id, rd_val')
 
 
-PolyRandOperator = namedtuple('PolyRandOperator',
-                              'weights, coefs, reps, '
-                              'other_features, rd_id, rd_val')
+class LinearKernel(object):
+    """ Simple no-op kernel that has functionality to add an intercept to the
+    feature list during transformation.
+    """
+    def __init__(self, schema_madlib,
+                 create_view=True, fit_intercept=True, **kwargs):
+        self.schema_madlib = schema_madlib
+        self.kernel_func = 'linear'
+        self.fit_intercept = fit_intercept
+        self.create_view = create_view
+        self.transformed_table = None
+        self.original_table = None
+
+    def clear(self):
+        if self.transformed_table:
+            data_type = 'view' if self.create_view else 'table'
+            plpy.execute("DROP {data_type} IF EXISTS {rel} CASCADE".
+                         format(data_type=data_type,
+                                rel=self.transformed_table['source_table']))
+
+    def save_as(self, _):
+        # nothing to save in a linear kernel
+        pass
+
+    @classmethod
+    def _get_default_params(cls):
+        return {'fit_intercept': False}
+
+    @classmethod
+    def create(cls, schema_madlib, params=None):
+        if not params:
+            params = cls._get_default_params()
+        return cls(schema_madlib, **params)
+
+    @property
+    def kernel_params(self):
+        return ('fit_intercept={fit_intercept}'
+                .format(fit_intercept=self.fit_intercept))
+
+    def fit(self, _):
+        self.clear()
+        return self
+
+    def transform(self, source_table, independent_varname,
+                  dependent_varname=None, grouping_col=None, id_col=None,
+                  transformed_name='linear_transformed'):
+        self.original_table = dict(source_table=source_table,
+                                   independent_varname=independent_varname,
+                                   dependent_varname=dependent_varname)
+        self.transformed_table = None
+        if self.fit_intercept:
+            schema_madlib = self.schema_madlib
+
+            def _cast_if_null(input, alias=''):
+                if input:
+                    return str(input)
+                else:
+                    null_str = "NULL::text"
+                    return null_str + " as " + alias if alias else null_str
+
+            data_type = 'VIEW' if self.create_view else 'TABLE'
+            id_col = _cast_if_null(id_col, unique_string('id_col'))
+            grouping_col = _cast_if_null(grouping_col, unique_string('grp_col'))
+            dependent_varname = _cast_if_null(dependent_varname)
+            features_col = unique_string(desp='features_col')
+            target_col = unique_string(desp='target_col')
+            transformed_rel = unique_string(desp='source_copied')
+            intercept_str = "NULL" if not self.fit_intercept else "ARRAY[1]::float[]"
+            run_sql = """
+                DROP {data_type} IF EXISTS {transformed_rel};
+                CREATE {data_type} {transformed_rel} AS
+                    SELECT
+                        array_cat({independent_varname}, {intercept_str})::float[] as {features_col},
+                        {dependent_varname} as {target_col},
+                        {id_col},
+                        {grouping_col}
+                    FROM {source_table}
+                    WHERE NOT {schema_madlib}.array_contains_null({independent_varname})
+                """.format(**locals())
+            plpy.execute(run_sql)
+            self.transformed_table = dict(source_table=transformed_rel,
+                                          dependent_varname=target_col,
+                                          independent_varname=features_col)
+        return self
 
 
 class PolyKernel(object):
     """docstring for PolyKernel"""
     def __init__(self, schema_madlib, degree=2, coef0=1, n_components=100,
-                 random_state=1, poly_operator=None, orig_data=None):
+                 random_state=1, poly_operator=None, orig_data=None,
+                 fit_intercept=True, **kwargs):
         self.schema_madlib = schema_madlib
         self.kernel_func = 'polynomial'
         self.degree = degree
         self.coef0 = coef0
         self.n_components = n_components
         self.random_state = random_state
+        self.fit_intercept = fit_intercept
         # polynomial random mapping operator
         self.pro = poly_operator
         self.orig_data = orig_data
@@ -69,16 +149,12 @@ class PolyKernel(object):
             """.format(pro=self.pro, data_type=data_type)
             plpy.execute(run_sql)
 
-    def __del__(self):
-        self.clear()
-
     def save_as(self, name):
         if self.orig_data:
             plpy.warning("Polynomial Kernel Warning: no need to save."
                          "Original data table exists: {0}"
                          .format(self.orig_data))
             return
-
         run_sql = """
             create table {name} as
                 select {pro.rd_id} as id, {pro.rd_val} as val,
@@ -100,18 +176,21 @@ class PolyKernel(object):
         plpy.execute(run_sql)
 
     @classmethod
-    def create(cls, schema_madlib, n_features, kernel_params):
-        params = cls.parse_params(kernel_params, n_features)
+    def create(cls, schema_madlib, n_features, params=None):
+        if not params:
+            params = cls._get_default_params(n_features)
         return cls(schema_madlib, **params)
 
     @classmethod
-    def load_from(cls, schema_madlib, data, kernel_params=''):
+    def load_from(cls, schema_madlib, data, params=None):
         other_features = unique_string(desp='other_features')
         rd_weights = unique_string(desp='random_weights')
         rd_coefs = unique_string(desp='random_coefs')
         rd_reps = unique_string(desp='random_reps')
         rd_val = unique_string(desp='val')
         rd_id = unique_string(desp='id')
+        if not params:
+            params = cls._get_default_params()
         plpy.execute("""
                 drop view if exists {rd_weights};
                 create temp view {rd_weights} as
@@ -133,35 +212,29 @@ class PolyKernel(object):
                     select id as {rd_id}, val as {rd_val} from {data}
                     where desp = 'other_features';
                      """.format(**locals()))
-        params = cls.parse_params(kernel_params)
         pro = PolyRandOperator(weights=rd_weights, coefs=rd_coefs,
                                reps=rd_reps, other_features=other_features,
                                rd_id=rd_id, rd_val=rd_val)
 
         return cls(schema_madlib, poly_operator=pro, orig_data=data, **params)
 
-    @property
-    def kernel_params(self):
-        return ('degree={degree}, coef0={coef0}, '
-                'n_components={n_components}, '
-                'random_state={random_state}'
-                .format(degree=self.degree, coef0=self.coef0,
-                        n_components=self.n_components,
-                        random_state=self.random_state))
-
     @classmethod
-    def parse_params(cls, kernel_params='', n_features=10):
-        params_default = {
+    def _get_default_params(cls, n_features=10):
+        return {
+            'n_components': 2 * n_features,
+            'fit_intercept': False,
+            'random_state': 1,
             'degree': 3,
-            'n_components': 2*n_features,
             'coef0': 1,
-            'random_state': 1}
-        params_types = {
-            'degree': int,
-            'n_components': int,
-            'coef0': float,
-            'random_state': int}
-        return extract_keyvalue_params(kernel_params, params_types, params_default)
+        }
+
+    @property
+    def kernel_params(self):
+        return ('degree={self.degree}, coef0={self.coef0}, '
+                'n_components={self.n_components}, '
+                'random_state={self.random_state}, '
+                'fit_intercept={self.fit_intercept}'
+                .format(self=self))
 
     def fit(self, n_features):
         # fast way to compute nCr
@@ -170,7 +243,7 @@ class PolyKernel(object):
             r = min(r, n-r)
             if r == 0:
                 return 1
-            numer = reduce(mul, range(n, n-r, -1))
+            numer = functools.reduce(operator.mul, range(n, n-r, -1))
             denom = factorial(r + 1)
             return numer // denom
 
@@ -212,8 +285,7 @@ class PolyKernel(object):
                 select
                     $1 as {val}, id as {id}
                 from generate_series(1, 1) as id
-        """.format(data=rd_coefs_,
-                   val=rd_val_, id=rd_id_)
+        """.format(data=rd_coefs_, val=rd_val_, id=rd_id_)
         plpy.execute(plpy.prepare(run_sql, ["float[]"]), [vals_])
 
         rd_reps_ = unique_string(desp='reps_nz')
@@ -257,10 +329,10 @@ class PolyKernel(object):
         schema_madlib = self.schema_madlib
 
         def _cast_if_null(input, alias):
-            null_str = "NULL::integer"
             if input:
                 return str(input)
             else:
+                null_str = "NULL::text"
                 return null_str + " as " + alias if alias else null_str
 
         grouping_col = _cast_if_null(grouping_col, unique_string('grp_col'))
@@ -270,28 +342,31 @@ class PolyKernel(object):
         features_col = unique_string(desp='features_col')
         target_col = unique_string(desp='target_col')
         transformed = unique_string(desp=transformed_name)
-
+        intercept = "NULL" if not self.fit_intercept else "ARRAY[1]::float[]"
         # X = a * cos (X*C + b)
         pro, multiplier = self.pro, sqrt(1. / self.n_components)
         run_sql = """
         drop table if exists {transformed};
         create temp table {transformed} as
             select
-                {schema_madlib}.array_scalar_mult(
-                    array_cat(
-                        {schema_madlib}.array_mult(
-                            {schema_madlib}.__row_fold(
-                                {schema_madlib}.__matrix_vec_mult_in_mem(
-                                    q.{features_col}::float[],
-                                    weights.{pro.rd_val}::float[]
+                array_cat(
+                    {schema_madlib}.array_scalar_mult(
+                        array_cat(
+                            {schema_madlib}.array_mult(
+                                {schema_madlib}.__row_fold(
+                                    {schema_madlib}.__matrix_vec_mult_in_mem(
+                                        q.{features_col}::float[],
+                                        weights.{pro.rd_val}::float[]
+                                    )::float[],
+                                    reps.{pro.rd_val}::integer[]
                                 )::float[],
-                                reps.{pro.rd_val}::integer[]
+                                coefs.{pro.rd_val}::float[]
                             )::float[],
-                            coefs.{pro.rd_val}::float[]
+                            of.{pro.rd_val}::float[]
                         )::float[],
-                        of.{pro.rd_val}::float[]
+                        {multiplier}::float
                     )::float[],
-                    {multiplier}::float
+                    {intercept}
                 ) as {features_col},
                 q.{target_col} as {target_col},
                 {id_col},
@@ -323,12 +398,13 @@ class GaussianKernelBase(object):
 
     def __init__(self, schema_madlib, gamma, n_components, random_state,
                  random_weights, random_offset, id_col, val_col,
-                 orig_data, **kwargs):
+                 orig_data, fit_intercept=True, **kwargs):
         self.kernel_func = 'gaussian'
         self.gamma = gamma
         self.n_components = n_components
         # int32 seed used by boost::minstd_rand
         self.random_state = random_state
+        self.fit_intercept = fit_intercept
         # random operators
         self.rd_weights = random_weights
         self.rd_offset = random_offset
@@ -385,9 +461,6 @@ class GaussianKernelBase(object):
                          data=self.rd_offset,
                          data_type=data_type))
 
-    def __del__(self):
-        self.clear()
-
     def save_as(self, name):
         if self.orig_data:
             plpy.warning("Gaussian Kernel Warning: no need to save."
@@ -410,25 +483,10 @@ class GaussianKernelBase(object):
         plpy.execute(run_sql)
 
     @classmethod
-    def parse_params(cls, kernel_params='', n_features=10):
-        params_default = {
-            'in_memory': 1,
-            'gamma': 1 / n_features,
-            'random_state': 1,
-            'n_components': 2 * n_features}
-        params_types = {
-            'in_memory': int,
-            'gamma': float,
-            'random_state': int,
-            'n_components': int}
-        return extract_keyvalue_params(kernel_params,
-                                       params_types,
-                                       params_default)
-
-    @classmethod
-    def create(cls, schema_madlib, n_features, kernel_params):
-        params = cls.parse_params(kernel_params, n_features)
-        in_memory = params.pop('in_memory', True)
+    def create(cls, schema_madlib, n_features, params=None):
+        if not params:
+            params = cls._get_default_params(n_features)
+        in_memory = params.pop('fit_in_memory', True)
         # according to the 1gb limit on each entry of the table
         n_elems = params['n_components'] * n_features
         if in_memory and n_elems <= 1e8:
@@ -437,7 +495,17 @@ class GaussianKernelBase(object):
             return GaussianKernel(schema_madlib, **params)
 
     @classmethod
-    def load_from(cls, schema_madlib, data, kernel_params=''):
+    def _get_default_params(cls, n_features=10):
+        return {
+            'n_components': 2 * n_features,
+            'fit_intercept': False,
+            'random_state': 1,
+            'fit_in_memory': True,
+            'gamma': 1 / n_features,
+        }
+
+    @classmethod
+    def load_from(cls, schema_madlib, data, params=None):
         rd_weights = unique_string(desp='random_weights')
         rd_offset = unique_string(desp='random_offsets')
         rd_val = unique_string(desp='val')
@@ -453,8 +521,9 @@ class GaussianKernelBase(object):
                     select id as {rd_id}, val as {rd_val} from {data}
                     where desp = 'offsets';
                      """.format(**locals()))
-        params = cls.parse_params(kernel_params)
-        in_memory = params.pop('in_memory', True)
+        if not params:
+            params = cls._get_default_params()
+        in_memory = params.pop('fit_in_memory', True)
         if in_memory:
             return GaussianKernelInMemory(schema_madlib,
                                           random_weights=rd_weights,
@@ -476,7 +545,7 @@ class GaussianKernel(GaussianKernelBase):
     def __init__(self, schema_madlib, gamma=1, n_components=100,
                  random_state=1, random_weights=None,
                  random_offset=None, id_col=None, val_col=None,
-                 orig_data=None, **kwargs):
+                 orig_data=None, fit_intercept=True, **kwargs):
         params = locals()
         params.pop('self')
         super(GaussianKernel, self).__init__(**params)
@@ -484,10 +553,11 @@ class GaussianKernel(GaussianKernelBase):
     @property
     def kernel_params(self):
         return ('gamma={gamma}, n_components={n_components},'
-                'random_state={random_state}, in_memory=0'
+                'random_state={random_state}, fit_intercept={fit_intercept}, fit_in_memory=False'
                 .format(gamma=self.gamma,
                         n_components=self.n_components,
-                        random_state=self.random_state))
+                        random_state=self.random_state,
+                        fit_intercept=self.fit_intercept))
 
     def fit(self, n_features):
         self.clear()
@@ -511,10 +581,10 @@ class GaussianKernel(GaussianKernelBase):
         schema_madlib = self.schema_madlib
 
         def _cast_if_null(input, alias):
-            null_str = "NULL::integer"
             if input:
                 return str(input)
             else:
+                null_str = "NULL::text"
                 return null_str + " as " + alias if alias else null_str
 
         grouping_col = _cast_if_null(grouping_col, unique_string('grp_col'))
@@ -530,6 +600,7 @@ class GaussianKernel(GaussianKernelBase):
         features_col = unique_string(desp='features_col')
         target_col = unique_string(desp='target_col')
         index_col = unique_string(desp='index_col')
+
         run_sql = """
             select setseed(0.5);
             drop table if exists {source_with_id};
@@ -549,6 +620,7 @@ class GaussianKernel(GaussianKernelBase):
         independent_varname = features_col
 
         temp_transformed = unique_string(desp='temp_transformed')
+
         # X = X * weights
         run_sql = """
             drop table if exists {temp_transformed};
@@ -575,15 +647,17 @@ class GaussianKernel(GaussianKernelBase):
 
         # X = a * cos (X + b)
         multiplier = sqrt(2. / self.n_components)
+        intercept = "NULL" if not self.fit_intercept else "ARRAY[1]::float[]"
         run_sql = """
             drop table if exists {transformed};
             create temp table {transformed} as
                 select
-                    {index_col},
-                    {schema_madlib}.array_scalar_mult(
-                        {schema_madlib}.array_cos(
-                            q.{independent_varname}::float[])::float[],
-                        {multiplier}::float) as {independent_varname},
+                    array_cat({schema_madlib}.array_scalar_mult(
+                                    {schema_madlib}.array_cos(
+                                        q.{independent_varname}::float[])::float[],
+                                    {multiplier}::float)::float[],
+                              {intercept}
+                             ) as {independent_varname},
                     {dependent_varname},
                     {id_col},
                     {grouping_col}
@@ -613,18 +687,17 @@ class GaussianKernelInMemory(GaussianKernelBase):
     def __init__(self, schema_madlib, gamma=1, n_components=100,
                  random_state=1, random_weights=None,
                  random_offset=None, id_col=None,
-                 val_col=None, orig_data=None, **kwargs):
+                 val_col=None, orig_data=None, fit_intercept=True, **kwargs):
         params = locals()
         params.pop('self')
         super(GaussianKernelInMemory, self).__init__(**params)
 
     @property
     def kernel_params(self):
-        return ('gamma={gamma}, n_components={n_components},'
-                'random_state={random_state}, in_memory=1'
-                .format(gamma=self.gamma,
-                        n_components=self.n_components,
-                        random_state=self.random_state))
+        return ('gamma={self.gamma}, n_components={self.n_components},'
+                'random_state={self.random_state}, '
+                'fit_intercept={self.fit_intercept}, fit_in_memory=True'
+                .format(self=self))
 
     def fit(self, n_features):
         self.clear()
@@ -664,23 +737,27 @@ class GaussianKernelInMemory(GaussianKernelBase):
         target_col = unique_string(desp='target_col')
         transformed = unique_string(desp=transformed_name)
 
-        # X <- a * cos (X*C + b)
+        # X <- 1 + a * cos (X*C + b)
         multiplier = sqrt(2. / self.n_components)
+        intercept = "NULL" if not self.fit_intercept else "ARRAY[1]::float[]"
         run_sql = """
             drop table if exists {transformed};
             create temp table {transformed} as
                 select
-                    {schema_madlib}.array_scalar_mult(
-                        {schema_madlib}.array_cos(
-                            {schema_madlib}.array_add(
-                                {schema_madlib}.__matrix_vec_mult_in_mem(
-                                    q.{features_col}::float[],
-                                    rw.{self.rd_val}::float[]
-                                )::float[],
-                                ro.{self.rd_val}::float[]
-                            )::float[]
+                    array_cat(
+                        {schema_madlib}.array_scalar_mult(
+                            {schema_madlib}.array_cos(
+                                {schema_madlib}.array_add(
+                                    {schema_madlib}.__matrix_vec_mult_in_mem(
+                                        q.{features_col}::float[],
+                                        rw.{self.rd_val}::float[]
+                                    )::float[],
+                                    ro.{self.rd_val}::float[]
+                                )::float[]
+                            )::float[],
+                            {multiplier}::float
                         )::float[],
-                        {multiplier}::float
+                        {intercept}
                     ) as {features_col},
                     q.{target_col} as {target_col},
                     {id_col},
@@ -704,19 +781,19 @@ class GaussianKernelInMemory(GaussianKernelBase):
         return self
 
 
-def create_kernel(schema_madlib, n_features, kernel_func, kernel_params):
+def create_kernel(schema_madlib, n_features, kernel_func, kernel_params_dict):
     if kernel_func == 'linear':
-        return None
+        return LinearKernel.create(schema_madlib, kernel_params_dict)
     elif kernel_func == 'gaussian':
-        return GaussianKernelBase.create(schema_madlib, n_features, kernel_params)
+        return GaussianKernelBase.create(schema_madlib, n_features, kernel_params_dict)
     elif kernel_func == 'polynomial':
-        return PolyKernel.create(schema_madlib, n_features, kernel_params)
+        return PolyKernel.create(schema_madlib, n_features, kernel_params_dict)
 
 
-def load_kernel(schema_madlib, data, kernel_func, kernel_params):
+def load_kernel(schema_madlib, data, kernel_func, kernel_params_dict):
     if kernel_func == 'linear':
-        return None
+        return LinearKernel.create(schema_madlib, kernel_params_dict)
     elif kernel_func == 'gaussian':
-        return GaussianKernelBase.load_from(schema_madlib, data, kernel_params)
+        return GaussianKernelBase.load_from(schema_madlib, data, kernel_params_dict)
     elif kernel_func == 'polynomial':
-        return PolyKernel.load_from(schema_madlib, data, kernel_params)
+        return PolyKernel.load_from(schema_madlib, data, kernel_params_dict)

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/b7484c1f/src/ports/postgres/modules/svm/svm.py_in
----------------------------------------------------------------------
diff --git a/src/ports/postgres/modules/svm/svm.py_in b/src/ports/postgres/modules/svm/svm.py_in
index c43d3a8..93431e3 100644
--- a/src/ports/postgres/modules/svm/svm.py_in
+++ b/src/ports/postgres/modules/svm/svm.py_in
@@ -9,13 +9,13 @@ from kernel_approximation import create_kernel, load_kernel
 from utilities.control import MinWarning
 from utilities.in_mem_group_control import GroupIterationController
 from utilities.validate_args import explicit_bool_to_text
-from utilities.utilities import unique_string
 from utilities.utilities import extract_keyvalue_params
 from utilities.utilities import preprocess_keyvalue_params
 from utilities.utilities import add_postfix
 from utilities.utilities import _string_to_array_with_quotes
 from utilities.utilities import _string_to_array
 from utilities.utilities import _assert
+from utilities.utilities import unique_string
 from utilities.utilities import num_features, num_samples
 
 from utilities.validate_args import cols_in_tbl_valid
@@ -69,32 +69,34 @@ def _compute_svm(args):
                 """)
         it.final()
     return iterationCtrl.iteration
-# ---------------------------------------------------
+# ------------------------------------------------------------------------------
 
 
 def _verify_table(source_table, model_table, dependent_varname,
-                  independent_varname, **kwargs):
+                  independent_varname, verify_dep=True, **kwargs):
     # validate input
     input_tbl_valid(source_table, 'SVM')
-    _assert(is_var_valid(source_table, dependent_varname),
-            "SVM error: invalid dependent_varname "
-            "('{dependent_varname}') for source_table "
-            "({source_table})!".format(dependent_varname=dependent_varname,
-                                       source_table=source_table))
     _assert(is_var_valid(source_table, independent_varname),
             "SVM error: invalid independent_varname "
             "('{independent_varname}') for source_table "
             "({source_table})!".format(independent_varname=independent_varname,
                                        source_table=source_table))
 
-    dep_type = get_expr_type(dependent_varname, source_table)
-    if '[]' in dep_type:
-        plpy.error("SVM error: dependent_varname cannot be of array type!")
+    if verify_dep:
+        _assert(is_var_valid(source_table, dependent_varname),
+                "SVM error: invalid dependent_varname "
+                "('{dependent_varname}') for source_table "
+                "({source_table})!".format(dependent_varname=dependent_varname,
+                                           source_table=source_table))
+        dep_type = get_expr_type(dependent_varname, source_table)
+        if '[]' in dep_type:
+            plpy.error("SVM error: dependent_varname cannot be of array type!")
 
     # validate output tables
     output_tbl_valid(model_table, 'SVM')
     summary_table = add_postfix(model_table, "_summary")
     output_tbl_valid(summary_table, 'SVM')
+# ------------------------------------------------------------------------------
 
 
 def _get_grouping_col_str(schema_madlib, source_table, grouping_col):
@@ -125,6 +127,7 @@ def _get_grouping_col_str(schema_madlib, source_table, grouping_col):
         grouping_col = None
 
     return grouping_str, grouping_col
+# ------------------------------------------------------------------------------
 
 
 def _verify_get_params_dict(params_dict):
@@ -141,35 +144,50 @@ def _verify_get_params_dict(params_dict):
     _assert(not hasattr(params_dict['max_iter'], '__len__'),
             "SVM Error: max_iter should not be a list after cross validation!")
     return params_dict
+# ------------------------------------------------------------------------------
 
 
-def _build_output_tables(n_iters_run, model_table, args, transformer, **kwargs):
-    if transformer is None:
-        dependent_varname = args['col_dep_var']
-        independent_varname = args['col_ind_var']
-        source_table = args['rel_source']
-        kernel_func = "linear"
-        kernel_params = "NULL"
+def _build_output_tables(n_iters_run, args, **kwargs):
+
+    transformer = args['transformer']
+    use_transformer_for_output = args['use_transformer_for_output']
+    if use_transformer_for_output:
+        # transformer should always be a valid object created using the transform function.
+        ot = transformer.original_table
+        independent_varname = ot['independent_varname']
+        dependent_varname = ot['dependent_varname']
+        source_table = ot['source_table']
+        if not dependent_varname:
+            # an exception added for the svm_one_class where dependent_varname
+            # is artificially injected into the transformed table and does not
+            # exist in the original table. Hence we use transformed table
+            # to get the expression type
+            tt = transformer.transformed_table
+            dep_type = get_expr_type(tt['dependent_varname'], tt['source_table'])
+        else:
+            dep_type = get_expr_type(dependent_varname, source_table)
     else:
-        original_table = transformer.original_table
-        dependent_varname = original_table['dependent_varname']
-        independent_varname = original_table['independent_varname']
-        source_table = original_table['source_table']
-        random_table = add_postfix(model_table, "_random")
-        transformer.save_as(random_table)
-        kernel_func = transformer.kernel_func
-        kernel_params = transformer.kernel_params
+        source_table = args['source_table']
+        independent_varname = args['independent_varname']
+        dependent_varname = args['dependent_varname']
+        dep_type = get_expr_type(dependent_varname, source_table)
+
+    model_table = args['model_table']
+    random_table = add_postfix(model_table, "_random")
+    transformer.save_as(random_table)
+    kernel_func = transformer.kernel_func
+    kernel_params = transformer.kernel_params
 
     grouping_col = args['grouping_col']
     col_grp_key = args['col_grp_key']
-    groupby_str, grouping_str1, using_str = "", "", "ON TRUE"
     if grouping_col:
-        groupby_str = "GROUP BY {grouping_col}, {col_grp_key}".format(
-            grouping_col=grouping_col, col_grp_key=col_grp_key)
+        groupby_str = "GROUP BY {0}, {1}".format(grouping_col, col_grp_key)
         grouping_str1 = grouping_col + ","
         using_str = "USING ({col_grp_key})".format(col_grp_key=col_grp_key)
+    else:
+        groupby_str, grouping_str1, using_str = "", "", "ON TRUE"
     # organizing results
-    dep_type = get_expr_type(dependent_varname, source_table)
+    args.update(locals())
     model_table_query = """
         CREATE TABLE {model_table} AS
             SELECT
@@ -204,13 +222,7 @@ def _build_output_tables(n_iters_run, model_table, args, transformer, **kwargs):
                 {groupby_str}
             ) n_tuples_including_nulls_subq
             {using_str}
-        """.format(n_iters_run=n_iters_run,
-                   groupby_str=groupby_str,
-                   grouping_str1=grouping_str1,
-                   using_str=using_str,
-                   source_table=source_table,
-                   model_table=model_table,
-                   dep_type=dep_type, **args)
+        """.format(**args)
     plpy.execute(model_table_query)
 
     # summary table
@@ -219,10 +231,8 @@ def _build_output_tables(n_iters_run, model_table, args, transformer, **kwargs):
         FROM {0}
         WHERE coef IS NULL
         """.format(model_table))[0]['num_failed_groups']
-
     summary_table = add_postfix(model_table, "_summary")
     grouping_text = "NULL" if not grouping_col else grouping_col
-    args.update(locals())
     plpy.execute("""
             CREATE TABLE {summary_table} AS
             SELECT
@@ -246,11 +256,15 @@ def _build_output_tables(n_iters_run, model_table, args, transformer, **kwargs):
                 'lambda={lambda}, norm={norm}, n_folds={n_folds}'::text
                                                     AS reg_params,
                 count(*)::integer                   AS num_all_groups,
-                {n_failed_groups}::integer                          AS num_failed_groups,
+                {n_failed_groups}::integer          AS num_failed_groups,
                 sum(num_rows_processed)::bigint     AS total_rows_processed,
                 sum(num_rows_skipped)::bigint       AS total_rows_skipped
             FROM {model_table};
-            """.format(**args))
+            """.format(summary_table=summary_table,
+                       grouping_text=grouping_text,
+                       n_failed_groups=n_failed_groups,
+                       **args))
+# ------------------------------------------------------------------------------
 
 
 def svm_predict_help(schema_madlib, message, **kwargs):
@@ -379,11 +393,137 @@ def svm_predict_help(schema_madlib, message, **kwargs):
         return """
             No such option. Use "SELECT {schema_madlib}.svm_predict()" for help.
         """.format(**args)
+# ------------------------------------------------------------------------------
 
 
-def svm_help(schema_madlib, message, is_svc, **kwargs):
-    method = 'svm_classification' if is_svc else 'svm_regression'
+def svm_one_class(schema_madlib, source_table, model_table, independent_varname,
+                  kernel_func, kernel_params, grouping_col, params,
+                  verbose, **kwargs):
+    """ Execute the support vector one-class classification algorithm.
+
+    The data in 'source_table' only contains independent variables. The algorithm
+     works by learning a classifier between these independent features
+     and the origin. The given data is treated as positive data and the origin
+     is treated as negative, with higher weight given to the origin to ensure
+     a balanced learning update.
+    """
+    is_svc = True
+    dependent_varname = None
+    verbosity_level = "info" if verbose else "error"
+    with MinWarning(verbosity_level):
+        _verify_table(source_table, model_table,
+                      dependent_varname, independent_varname, verify_dep=False)
+        grouping_str, grouping_col = _get_grouping_col_str(schema_madlib,
+                                                           source_table, grouping_col)
+        if not kernel_func:
+            kernel_func = 'gaussian'
+        else:
+            kernel_func = _get_kernel_name(kernel_func)
+        # _transform_w_kernel should always return a transformer. Since
+        # override_fit_intercept=True, it should always create a transformed_table
+        # containing a intercept along with any kernel transformation in the
+        # independent variable array
+        transformer = _transform_w_kernel(schema_madlib, source_table,
+                                          dependent_varname, independent_varname,
+                                          kernel_func, kernel_params,
+                                          grouping_col, override_fit_intercept=True)
+        params_dict = _extract_params(schema_madlib, params)
+        if not params_dict['class_weight']:
+            params_dict['class_weight'] = 'balanced'
+
+        source_table = transformer.transformed_table['source_table']
+        independent_varname = transformer.transformed_table['independent_varname']
+        dependent_varname = transformer.transformed_table['dependent_varname']
+        update_source_for_one_class = True
+        args = locals()
+        _cross_validate_svm(args)
+        _svm_parsed_params(use_transformer_for_output=True, **args)
+        transformer.clear()
+# ------------------------------------------------------------------------------
+
 
+def get_svc_params_usage_string():
+    return """
+      ---------------------------------------------------------------------------
+                                  OTHER PARAMETERS
+      ---------------------------------------------------------------------------
+      Parameters are supplied in params argument as a string
+      containing a comma-delimited list of name-value pairs.
+
+      Hyperparameter optimization can be carried out through
+      the built-in cross validation mechanism
+
+      init_stepsize       -- Default: [0.01]. Also known as the inital learning rate.
+      decay_factor        -- Default: [0.9].
+                             Control the learning rate schedule:
+                             0 means constant rate; -1 means inverse scaling, i.e.,
+                             stepsize = init_stepsize / iteration;
+                             > 0 means exponential decay, i.e.,
+                             stepsize = init_stepsize * decay_factor^iteration.
+      max_iter            -- Default: [100].
+                             The maximum number of iterations allowed.
+      tolerance           -- Default: 1e-10. The criteria to end iterations.
+      lambda              -- Default: [0.01]. Regularization parameter, positive.
+      norm                -- Default: 'L2'.
+                             Name of the regularization, either 'L2' or 'L1'.
+      epsilon             -- Default: [0.01].
+                             Determines the $\epsilon$ for $\epsilon$-regression.
+                             Ignored during classification.
+      eps_tabl            -- Default: NULL.
+                             Name of the table that contains values of epsilon for
+                             different groups. Ignored when grouping_col is NULL.
+      validation_result   -- Default: NULL.
+                             Name of the table to store the cross validation results
+                             including the values of parameters and
+                             their averaged error values.
+      n_folds             -- Default: 0. Number of folds.
+                             Must be at least 2 to activate cross validation.
+    """
+# ------------------------------------------------------------------------------
+
+
+def get_svc_gaussian_usage_string():
+    return """
+      ---------------------------------------------------------------------------
+                                  GAUSSIAN PARAMETERS
+      ---------------------------------------------------------------------------
+      Parameters are supplied in kernel_params argument as a string
+      containing a comma-delimited list of name-value pairs.
+      gamma               -- Default: 1/num_features.
+                             The parameter $\gamma$ in the Radius Basis
+                             Function kernel,
+      n_components        -- Default: 2*num_features.
+                             The dimensionality of the transformed feature space.
+      random_state        -- Default: 1. Seed used by the random number generator.
+      """
+# ------------------------------------------------------------------------------
+
+
+def get_svc_poly_usage_string():
+    return """
+        ---------------------------------------------------------------------------
+                                    POLYNOMIAL PARAMETERS
+        ---------------------------------------------------------------------------
+        Parameters are supplied in kernel_params argument as a string
+        containing a comma-delimited list of name-value pairs.
+
+            coef0               -- Default: 1.0.
+                                   The independent term q in (xTy + q)^r.
+                                   Must be larger or equal to 0. When it is 0,
+                                   the polynomial kernel is in homogeneous form.
+            degree              -- Default: 3.
+                                   The parameter r in (xTy + q)^r.
+            n_components        -- Default: 2*num_features.
+                                   The dimensionality of the transformed feature space.
+                                   A larger value lowers the variance of the estimate of
+                                   kernel but requires more memory and
+                                   takes longer to train.
+            random_state        -- Default: 1. Seed used by the random number generator.
+        """
+
+
+def svm_one_class_help(schema_madlib, message, is_svc, **kwargs):
+    method = 'svm_one_class'
     args = dict(schema_madlib=schema_madlib, method=method)
 
     summary = """
@@ -411,7 +551,6 @@ def svm_help(schema_madlib, message, is_svc, **kwargs):
     SELECT {schema_madlib}.{method}(
         source_table,         -- name of input table
         model_table,          -- name of output model table
-        dependent_varname,    -- name of dependent variable
         independent_varname,  -- names of independent variables
         kernel_func,          -- optional, default: 'linear'.
                                  supported type of kernel: 'linear', 'gaussian',
@@ -473,6 +612,10 @@ def svm_help(schema_madlib, message, is_svc, **kwargs):
     __dep_var_mapping   TEXT[],     -- vector of dependendent variable labels.
                                        The first entry will correspond to -1
                                        and the second to +1, for internal use.
+                                       Since the input table does not have an
+                                       dependendent variable, a new column is
+                                       created while learning the one-class SVM
+                                       model.
 
     An auxiliary table named <model_table>_random is created if the kernel is not
     linear. It contains data needed to embed test data into random feature space
@@ -486,7 +629,7 @@ def svm_help(schema_madlib, message, is_svc, **kwargs):
                                            generate the model.
     source_table            varchar,    -- the data source table name.
     model_table             varchar,    -- the model table name.
-    dependent_varname       varchar,    -- the dependent variable.
+    dependent_varname       varchar,    -- the dependent variable, created automatically.
     independent_varname     varchar,    -- the independent variables.
     kernel_func             varchar,    -- the kernel function.
     kernel_parameters       varchar,    -- the kernel parameters.
@@ -503,85 +646,231 @@ def svm_help(schema_madlib, message, is_svc, **kwargs):
                                            due to missing values or failures.
     """.format(**args)
 
-    params_usage = """
+    params_usage = get_svc_params_usage_string()
+    gaussian_usage = get_svc_gaussian_usage_string()
+    poly_usage = get_svc_poly_usage_string()
+
+    example_usage = """
     ---------------------------------------------------------------------------
-                                OTHER PARAMETERS
+                                  EXAMPLES
     ---------------------------------------------------------------------------
-    Parameters are supplied in params argument as a string
-    containing a comma-delimited list of name-value pairs.
-
-    Hyperparameter optimization can be carried out through
-    the built-in cross validation mechanism
-
-    init_stepsize       -- Default: [0.01]. Also known as the inital learning rate.
-    decay_factor        -- Default: [0.9].
-                           Control the learning rate schedule:
-                           0 means constant rate; -1 means inverse scaling, i.e.,
-                           stepsize = init_stepsize / iteration;
-                           > 0 means exponential decay, i.e.,
-                           stepsize = init_stepsize * decay_factor^iteration.
-    max_iter            -- Default: [100].
-                           The maximum number of iterations allowed.
-    tolerance           -- Default: 1e-10. The criteria to end iterations.
-    lambda              -- Default: [0.01]. Regularization parameter, positive.
-    norm                -- Default: 'L2'.
-                           Name of the regularization, either 'L2' or 'L1'.
-    epsilon             -- Default: [0.01].
-                           Determines the $\epsilon$ for $\epsilon$-regression.
-                           Ignored during classification.
-    eps_tabl            -- Default: NULL.
-                           Name of the table that contains values of epsilon for
-                           different groups. Ignored when grouping_col is NULL.
-    validation_result   -- Default: NULL.
-                           Name of the table to store the cross validation results
-                           including the values of parameters and
-                           their averaged error values.
-    n_folds             -- Default: 0. Number of folds.
-                           Must be at least 2 to activate cross validation.
-    class_weight        -- Default: 1 for each class
-                            The weights for each class.
-                            If 'balanced', values of y are automatically adjusted
-                                as inversely proportional to class frequencies.
-                            Alternatively, can be a mapping giving the weight
-                            for each class. Eg. For dependent variable values
-                            'a' and 'b', the class_weight can be {a: 2, b: 3}.
-    """
+    - Create an input data set.
+
+    CREATE TABLE houses (id INT, tax INT, bedroom INT, bath FLOAT, price INT,
+                size INT, lot INT);
+    COPY houses FROM STDIN WITH DELIMITER '|';
+      1 |  590 |       2 |    1 |  50000 |  770 | 22100
+      2 | 1050 |       3 |    2 |  85000 | 1410 | 12000
+      3 |   20 |       3 |    1 |  22500 | 1060 |  3500
+      4 |  870 |       2 |    2 |  90000 | 1300 | 17500
+      5 | 1320 |       3 |    2 | 133000 | 1500 | 30000
+      6 | 1350 |       2 |    1 |  90500 |  820 | 25700
+      7 | 2790 |       3 |  2.5 | 260000 | 2130 | 25000
+      8 |  680 |       2 |    1 | 142500 | 1170 | 22000
+      9 | 1840 |       3 |    2 | 160000 | 1500 | 19000
+     10 | 3680 |       4 |    2 | 240000 | 2790 | 20000
+     11 | 1660 |       3 |    1 |  87000 | 1030 | 17500
+     12 | 1620 |       3 |    2 | 118600 | 1250 | 20000
+     13 | 3100 |       3 |    2 | 140000 | 1760 | 38000
+     14 | 2070 |       2 |    3 | 148000 | 1550 | 14000
+     15 |  650 |       3 |  1.5 |  65000 | 1450 | 12000
+    \.
+
+    - Generate a non-linear one-class SVM using a Gaussian kernel. We
+      specify the initial step size and maximum number of iterations to run.
+      As part of the kernel parameter, we choose 10 as the dimension of the
+      space where we train SVM. A larger number will lead to a more powerful
+      model but run the risk of overfitting. As a result, the model will be a
+      10 dimensional vector.
+
+    select {schema_madlib}.svm_one_class('houses',
+                                'houses_one_class_gaussian',
+                                'ARRAY[1,tax,bedroom,bath,size,lot,price]',
+                                'gaussian',
+                                'gamma=0.01,n_components=10',
+                                NULL,
+                                'max_iter=250, init_stepsize=100,lambda=0.9'
+                                );
+
+    - Create a test data set.
+    DROP TABLE IF EXISTS houses_novelty_test;
+    CREATE TABLE houses_novelty_test (id INT, tax INT, bedroom INT, bath FLOAT, price INT,
+                size INT, lot INT);
+    COPY houses_novelty_test FROM STDIN WITH DELIMITER '|';
+      1 |  33590 |       12 |    11 |  5000000 |  12770 | 221100
+      2 | 1050 |       31 |    21 |  85000000 | 141210 | 120010
+      3 |   233330 |     13 |    11 |  22500000 | 112060 |  351100
+      4 |  833370 |       12 |    12 |  9000000 | 130120 | 1751100
+      5 | 132330 |       31 |    12 | 133000000 | 150120 | 30011100
+      6 | 135330 |       21 |    11 |  90500000 |  8212120 | 25711100
+      7 | 279330 |       31 |  21.5 | 260000000 | 213012 | 25011100
+      8 | 6803333 |       12 |    11 | 142500000 | 117012 | 22111000
+      9 | 33331840 |       31 |    12 | 160000000 | 150120 | 19011100
+     10 | 3780 |       4 |    2 | 220000 | 2790 | 21000
+     11 | 1760 |       3 |    1 |  77000 | 1030 | 18500
+     12 | 1520 |       3 |    2 | 128600 | 1250 | 21000
+     13 | 3000 |       3 |    2 | 130000 | 1760 | 37000
+     14 | 2170 |       2 |    3 | 138000 | 1550 | 13000
+     15 |  750 |       3 |  1.5 |  75000 | 1450 | 13000
+    \.
+
+    - Use the prediction function to evaluate the models. The predicted
+      results are in the prediction column and the actual data is in the
+      target column.
+    -- For the Gaussian model:
+    SELECT {schema_madlib}.svm_predict('houses_one_class_gaussian',
+                                       'houses_test',
+                                       'id',
+                                       'houses_pred_gaussian');
+    -- View the results of the prediction function:
+    SELECT * FROM houses_novelty_test JOIN houses_pred_gaussian USING (id) ORDER BY id;
+
+    """.format(**args)
+
+    if not message:
+        return summary
+    elif message.lower() in ('usage', 'help', '?'):
+        return usage
+    elif message.lower() == 'example':
+        return example_usage
+    elif message.lower() == 'params':
+        return params_usage
+    elif message.lower() == 'gaussian':
+        return gaussian_usage
+    elif message.lower() == 'polynomial':
+        return poly_usage
+    else:
+        return """
+            No such option. Use "SELECT {schema_madlib}.{method}()" for help.
+        """.format(**args)
+# ------------------------------------------------------------------------------
+
+
+def svm_help(schema_madlib, message, is_svc, **kwargs):
+    method = 'svm_classification' if is_svc else 'svm_regression'
+
+    args = dict(schema_madlib=schema_madlib, method=method)
+
+    summary = """
+    ----------------------------------------------------------------
+                            SUMMARY
+    ----------------------------------------------------------------
+    Support Vector Machines (SVMs) are models for regression
+    and classification tasks.
+
+    SVM models have two particularly desirable features:
+    robustness in the presence of noisy data and applicability
+    to a variety of data configurations.
+
+    For more details on function usage:
+        SELECT {schema_madlib}.{method}('usage')
 
-    gaussian_usage = """
+    For a small example on using the function:
+        SELECT {schema_madlib}.{method}('example')
+        """.format(**args)
+
+    usage = """
     ---------------------------------------------------------------------------
-                                GAUSSIAN PARAMETERS
+                                    USAGE
     ---------------------------------------------------------------------------
-    Parameters are supplied in kernel_params argument as a string
-    containing a comma-delimited list of name-value pairs.
-
-    gamma               -- Default: 1/num_features.
-                           The parameter $\gamma$ in the Radius Basis
-                           Function kernel,
-    n_components        -- Default: 2*num_features.
-                           The dimensionality of the transformed feature space.
-    random_state        -- Default: 1. Seed used by the random number generator.
-    """
+    SELECT {schema_madlib}.{method}(
+        source_table,         -- name of input table
+        model_table,          -- name of output model table
+        dependent_varname,    -- name of dependent variable
+        independent_varname,  -- names of independent variables
+        kernel_func,          -- optional, default: 'linear'.
+                                 supported type of kernel: 'linear', 'gaussian',
+                                 and 'polynomial'
+        kernel_params,        -- optional, default: NULL
+                                 parameters for non-linear kernel in a
+                                 comma-separated string of key-value pairs. The
+                                 parameters differ depending on the value of
+                                 kernel_func.
+                                 to find out more:
+
+                                    SELECT {schema_madlib}.{method}('kernel_func')
+
+                                 where replace 'kernel_func' with whatever kernel
+                                 you are interested in, i.e.,
+
+                                    SELECT {schema_madlib}.{method}('gaussian')
+
+        grouping_cols,        -- optional, default NULL
+                                 names of columns to group-by
+        params,               -- optional, default NULL
+                                 parameters for optimization and regularization in
+                                 a comma-separated string of key-value pairs. If a
+                                 list of values are provided, then cross-
+                                 validation will be performed to select the best
+                                 value from the list.
+                                 to find out more:
+
+                                    SELECT {schema_madlib}.{method}('params')
+
+        verbose               -- optional, default FALSE
+                                 whether to print useful info
+    );
+
 
-    poly_usage = """
     ---------------------------------------------------------------------------
-                                POLYNOMIAL PARAMETERS
+                                    OUTPUT
     ---------------------------------------------------------------------------
-    Parameters are supplied in kernel_params argument as a string
-    containing a comma-delimited list of name-value pairs.
-
-    coef0               -- Default: 1.0.
-                           The independent term q in (xTy + q)^r.
-                           Must be larger or equal to 0. When it is 0,
-                           the polynomial kernel is in homogeneous form.
-    degree              -- Default: 3.
-                           The parameter r in (xTy + q)^r.
-    n_components        -- Default: 2*num_features.
-                           The dimensionality of the transformed feature space.
-                           A larger value lowers the variance of the estimate of
-                           kernel but requires more memory and
-                           takes longer to train.
-    random_state        -- Default: 1. Seed used by the random number generator.
-    """
+    The model table produced by svm contains the following columns:
+
+    coef                FLOAT8,     -- vector of the coefficients.
+    grouping_key        TEXT,       -- identifies the group to which
+                                       the datum belongs.
+    num_rows_processed  BIGINT,     -- numbers of rows processed.
+    num_rows_skipped    BIGINT,     -- numbers of rows skipped due
+                                       to missing values or failures.
+    num_iterations      INTEGER,    -- number of iterations completed by
+                                       the optimization algorithm.
+                                       The algorithm either converged in this
+                                       number of iterations or hit the maximum
+                                       number specified in the
+                                       optimization parameters.
+    loss                FLOAT8,     -- value of the objective function of
+                                       SVM.  See Technical Background section
+                                       below for more details.
+    norm_of_gradient    FLOAT8,     -- value of the L2-norm of the
+                                       (sub)-gradient of the objective
+                                       function.
+    __dep_var_mapping   TEXT[],     -- vector of dependendent variable labels.
+                                       The first entry will correspond to -1
+                                       and the second to +1, for internal use.
+
+    An auxiliary table named <model_table>_random is created if the kernel is not
+    linear. It contains data needed to embed test data into random feature space
+    (see reference [2,3]). This data is used internally by svm_predict and not
+    meaningful on its own.
+
+    A summary table named <model_table>_summary is also created at the same time,
+    which has the following columns:
+    method                  varchar,    -- 'svm'
+    version_number          varchar,    -- version of madlib which was used to
+                                           generate the model.
+    source_table            varchar,    -- the data source table name.
+    model_table             varchar,    -- the model table name.
+    dependent_varname       varchar,    -- the dependent variable.
+    independent_varname     varchar,    -- the independent variables.
+    kernel_func             varchar,    -- the kernel function.
+    kernel_parameters       varchar,    -- the kernel parameters.
+    grouping_col            varchar,    -- columns on which to group.
+    optim_params            varchar,    -- a string containing the
+                                           optimization parameters.
+    reg_params              varchar,    -- a string containing the
+                                           regularization parameters.
+    num_all_groups          integer,    -- number of groups in glm training.
+    num_failed_groups       integer,    -- number of failed groups in glm training.
+    total_rows_processed    integer,    -- total numbers of rows processed
+                                           in all groups.
+    total_rows_skipped      integer,    -- numbers of rows skipped in all groups
+                                           due to missing values or failures.
+    """.format(**args)
+
+    params_usage = get_svc_params_usage_string()
+    gaussian_usage = get_svc_gaussian_usage_string()
+    poly_usage = get_svc_poly_usage_string()
 
     example_usage = """
     ---------------------------------------------------------------------------
@@ -659,7 +948,7 @@ def svm_help(schema_madlib, message, is_svc, **kwargs):
         return summary
     elif message.lower() in ('usage', 'help', '?'):
         return usage
-    elif message.lower() == 'example':
+    elif message.lower() in ('example', 'examples'):
         return example_usage
     elif message.lower() == 'params':
         return params_usage
@@ -671,32 +960,37 @@ def svm_help(schema_madlib, message, is_svc, **kwargs):
         return """
             No such option. Use "SELECT {schema_madlib}.{method}()" for help.
         """.format(**args)
+# ------------------------------------------------------------------------------
 
 
 def svm(schema_madlib, source_table, model_table,
         dependent_varname, independent_varname, kernel_func,
         kernel_params, grouping_col, params, is_svc,
-        verbose, detect_novelty=False, **kwargs):
+        verbose, **kwargs):
     """
     Executes the linear support vector classification algorithm.
     """
     # verbosing
-    verbosity_level = "info" if verbose else "error"
+    verbosity_level = "warning" if verbose else "error"
     with MinWarning(verbosity_level):
         _verify_table(source_table, model_table,
                       dependent_varname, independent_varname)
         grouping_str, grouping_col = \
             _get_grouping_col_str(schema_madlib, source_table, grouping_col)
         kernel_func = _get_kernel_name(kernel_func)
-        transformer = _random_feature_map(schema_madlib, source_table,
+        transformer = _transform_w_kernel(schema_madlib, source_table,
                                           dependent_varname, independent_varname,
-                                          kernel_func, kernel_params, grouping_col)
+                                          kernel_func, kernel_params,
+                                          grouping_col)
         params_dict = _extract_params(schema_madlib, params)
         args = locals()
-        if transformer is not None:
+        if transformer.transformed_table:
             args.update(transformer.transformed_table)
+
         _cross_validate_svm(args)
-        _svm_parsed_params(**args)
+        _svm_parsed_params(use_transformer_for_output=True, **args)
+        transformer.clear()
+# ------------------------------------------------------------------------------
 
 
 def _cross_validate_svm(args):
@@ -744,12 +1038,16 @@ def _cross_validate_svm(args):
 
     scorer = 'classification' if args['is_svc'] else 'regression'
     sub_args = {'params_dict': cv_params}
-    transformer = args.get('transformer', None)
-    # we want svm in cross validation to behave as if transformer is None
-    # if it is not, then svm_predict will transform the test data again,
-    # which will not be correct since test data in cross validation
-    # comes from training data which has already been transformed
-    args.update(dict(transformer=None))
+    # we want svm in cross validation to not transform the data again,
+    # since test data in cross validation comes from the transformed source table.
+    # A linear transformer without adding intercept is a no-op transformer.
+    no_op_kernel = create_kernel(args['schema_madlib'], 0,
+                                 'linear', {'fit_intercept': False})
+    no_op_transformer = no_op_kernel.transform(args['source_table'],
+                                               args['independent_varname'],
+                                               args['dependent_varname'])
+    transformer = args.get('transformer', no_op_transformer)
+    args.update(dict(transformer=no_op_transformer))
     cv = CrossValidator(_svm_parsed_params, svm_predict, scorer, args)
     val_res = cv.validate(sub_args, params_dict['n_folds']).sorted()
     val_res.output_tbl(params_dict['validation_result'])
@@ -776,20 +1074,37 @@ def _get_kernel_name(kernel_func):
                        "{0}. Supported kernel functions are ({1})"
                        .format(kernel_func, ','.join(sorted(supported_kernels))))
     return kernel_func
+# ------------------------------------------------------------------------------
 
 
-def _random_feature_map(schema_madlib, source_table, dependent_varname,
+def _transform_w_kernel(schema_madlib, source_table, dependent_varname,
                         independent_varname, kernel_func,
-                        kernel_params, grouping_col):
-    if kernel_func == 'linear':
-        return None
+                        kernel_params, grouping_col, override_fit_intercept=False):
+    """ Transform source table with a kernel function and return the transfomer.
 
+    Args:
+        @param schema_madlib: str, Name of the MADlib schema
+        @param source_table: str, Name of the table with input data
+        @param dependent_varname: str, Name of the column containing response variable
+        @param independent_varname: str, Name of the column containing feature variables
+        @param kernel_func: str, Name of the kernel to apply
+        @param kernel_params: str, Key-value set of parameters for the kernel class
+        @param grouping_col: str, Comma-separated list of grouping column names
+        @param override_fit_intercept: bool, If True, the fit_intercept parameter
+                                        in kernel_params is always set to True
+                                        independent of user input. No-op if
+                                        this is False.
+    """
     n_features = num_features(source_table, independent_varname)
+    kernel_params_dict = _extract_kernel_params(kernel_params, n_features)
+    if override_fit_intercept:
+        kernel_params_dict['fit_intercept'] = True
     transformer = create_kernel(schema_madlib, n_features,
-                                kernel_func, kernel_params)
-    return (transformer.fit(n_features)
-            .transform(source_table, independent_varname,
-                       dependent_varname, grouping_col))
+                                kernel_func, kernel_params_dict)
+    return (transformer.fit(n_features).
+            transform(source_table, independent_varname,
+                      dependent_varname, grouping_col))
+# ------------------------------------------------------------------------------
 
 
 def _compute_class_weight_sql(source_table, dependent_varname,
@@ -829,22 +1144,71 @@ def _compute_class_weight_sql(source_table, dependent_varname,
                              format(dep=dependent_varname, k=k, v=v))
     class_weight_sql += "ELSE 1.0 END"
     return class_weight_sql
+# -------------------------------------------------------------------------
 
 
 def _svm_parsed_params(schema_madlib, source_table, model_table,
                        dependent_varname, independent_varname, transformer,
                        grouping_str, grouping_col, params_dict, is_svc,
-                       verbose, **kwargs):
+                       use_transformer_for_output=False,
+                       update_source_for_one_class=False,
+                       verbose=False, **kwargs):
     """
     Executes the linear support vector algorithm.
+
+    Args:
+        @param use_transformer_for_output: bool,
+            This variable decides if the output tables are created using either
+            the 'args' supplied in this function or the 'original_table'
+            structure in the transformer. This is necessary to allow creating
+            temporary output tables from cross validation which are different
+            from the 'original_table' used in the transformer.
+        @param update_source_for_one_class: bool,
+            This is a special indicator added here for svm_one_class. This has
+            to be placed here instead of the svm_one_class function so that
+            cross validation undergoes the same transformation for its split
+            datasets.
+
     """
     n_features = num_features(source_table, independent_varname)
+    if update_source_for_one_class:
+        # This block is run only when the caller is svm_one_class
+
+        # Create a temporary relation with a dependent variable and insert
+        # the origin into kernel space. Kernel adds an intercept at the end of the
+        # independent_varname. Here an origin is added to the source table, with
+        # the final value set to 1.
+        dependent_varname = unique_string(desp='dep_var')
+        source_w_origin = unique_string(desp='src_tbl')
+        plpy.execute("""
+            CREATE TEMP VIEW {source_w_origin} AS
+            SELECT {independent_varname},
+                   1.0 AS {dependent_varname}
+            FROM {source_table}
+            UNION
+            SELECT
+                array_append(
+                    {schema_madlib}.array_fill(
+                        {schema_madlib}.array_of_float({n_features} - 1),
+                        0::float)::float[],
+                    1::float
+                ) as {independent_varname},
+                -1::float as {dependent_varname}
+        """.format(**locals()))
+        source_table = source_w_origin
+        if transformer.transformed_table:
+            transformer.transformed_table.update(
+                dict(source_table=source_w_origin,
+                     dependent_varname=dependent_varname))
+        # args.update(transformer.transformed_table)
 
     class_weight_sql = _compute_class_weight_sql(source_table,
                                                  dependent_varname,
                                                  is_svc,
                                                  params_dict['class_weight'])
-    args = {
+
+    args = locals()
+    args.update({
         'rel_args': unique_string(desp='rel_args'),
         'rel_state': unique_string(desp='rel_state'),
         'col_grp_iteration': unique_string(desp='col_grp_iteration'),
@@ -852,17 +1216,10 @@ def _svm_parsed_params(schema_madlib, source_table, model_table,
         'col_grp_key': unique_string(desp='col_grp_key'),
         'col_n_tuples': unique_string(desp='col_n_tuples'),
         'state_type': "double precision[]",
-        'n_features': n_features,
-        'verbose': verbose,
-        'is_svc': is_svc,
-        'schema_madlib': schema_madlib,
-        'grouping_str': grouping_str,
-        'grouping_col': grouping_col,
-        'rel_source': source_table,
-        'col_ind_var': independent_varname,
-        'col_dep_var': dependent_varname,
-        'class_weight_sql': class_weight_sql
-    }
+        'rel_source': args['source_table'],
+        'col_ind_var': args['independent_varname'],
+        'col_dep_var': args['dependent_varname'],
+    })
 
     args.update(_verify_get_params_dict(params_dict))
     args.update(_process_epsilon(is_svc, args))
@@ -872,22 +1229,21 @@ def _svm_parsed_params(schema_madlib, source_table, model_table,
     plpy.execute("CREATE TABLE pg_temp.{0} AS SELECT 1".format(args['rel_args']))
     # actual iterative algorithm computation
     n_iters_run = _compute_svm(args)
-    _build_output_tables(n_iters_run, model_table, args, transformer, **kwargs)
+    _build_output_tables(n_iters_run, args, **kwargs)
+# -----------------------------------------------------------------------------
 
 
 def svm_predict(schema_madlib, model_table, new_data_table, id_col_name,
                 output_table, **kwargs):
-    """ Scores the data points stored in a table using a
-        learned support vector model.
+    """ Score data points stored in a table using a learned support vector model.
 
     @param model_table Name of learned model
     @param new_data_table Name of table/view containing the data
-        points to be scored
+                          points to be scored
     @param id_col_name Name of column in source_table containing
-        (integer) identifier for data point
+                       (integer) identifier for data point
     @param output_table Name of table to store the results
     """
-    # suppress warnings
     with MinWarning("warning"):
         # model table
         input_tbl_valid(model_table, 'SVM')
@@ -903,12 +1259,8 @@ def svm_predict(schema_madlib, model_table, new_data_table, id_col_name,
         # read necessary info from summary
         summary = plpy.execute("""
                 SELECT
-                    method,
-                    dependent_varname,
-                    independent_varname,
-                    kernel_func,
-                    kernel_params,
-                    grouping_col
+                    method, dependent_varname, independent_varname,
+                    kernel_func, kernel_params, grouping_col
                 FROM {summary_table}
                 """.format(**locals()))[0]
         method = summary['method']
@@ -932,18 +1284,27 @@ def svm_predict(schema_madlib, model_table, new_data_table, id_col_name,
                 "') is invalid for new_data_table (" + new_data_table + ")!")
         output_tbl_valid(output_table, 'SVM')
 
+        kernel_params_dict = _extract_kernel_params(kernel_params)
+        random_table = add_postfix(model_table, '_random')
         if kernel_func.lower() != 'linear':
-            random_table = add_postfix(model_table, '_random')
+            # random table is not created with the linear kernel and ignored
+            # in the load_kernel call, hence we disable the check for 'linear'
             input_tbl_valid(random_table, 'SVM')
-            transformer = load_kernel(schema_madlib, random_table,
-                                      kernel_func, kernel_params)
-            transformer.transform(new_data_table, independent_varname,
-                                  grouping_col=grouping_col, id_col=id_col_name)
-            transformed_table = transformer.transformed_table
-            new_data_table = transformed_table['source_table']
-            independent_varname = transformed_table['independent_varname']
-            dependent_varname = transformed_table['dependent_varname']
-
+        transformer = load_kernel(schema_madlib, random_table,
+                                  kernel_func, kernel_params_dict)
+        transformer.transform(new_data_table, independent_varname,
+                              grouping_col=grouping_col, id_col=id_col_name)
+        if transformer.transformed_table:
+            data_rel_info = transformer.transformed_table
+        else:
+            data_rel_info = transformer.original_table
+        new_data_table = data_rel_info['source_table']
+        independent_varname = data_rel_info['independent_varname']
+        dependent_varname = data_rel_info['dependent_varname']
+
+        pred_dist = """{0}.array_dot(coef::double precision [],
+                                     {1}::double precision [])
+                    """.format(schema_madlib, independent_varname)
         if method.upper() == 'SVC':
             pred_query = """
                         CASE WHEN {schema_madlib}.array_dot(
@@ -956,12 +1317,7 @@ def svm_predict(schema_madlib, model_table, new_data_table, id_col_name,
                         """.format(schema_madlib=schema_madlib,
                                    independent_varname=independent_varname)
         elif method.upper() == 'SVR':
-            pred_query = """
-                        {schema_madlib}.array_dot(
-                                coef::double precision [],
-                                {independent_varname}::double precision [])
-                        """.format(schema_madlib=schema_madlib,
-                                   independent_varname=independent_varname)
+            pred_query = pred_dist
         else:
             plpy.error("SVM Error: Invalid 'method' value in summary table. "
                        "'method' can only be SVC or SVR!")
@@ -972,6 +1328,7 @@ def svm_predict(schema_madlib, model_table, new_data_table, id_col_name,
             SELECT
                 {id_col_name} AS {id_col_name},
                 {pred_query} AS prediction,
+                {pred_dist} AS decision_function,
                 ARRAY[{grouping_str}] as grouping_col,
                 {grouping_col}
             FROM {model_table}
@@ -985,7 +1342,8 @@ def svm_predict(schema_madlib, model_table, new_data_table, id_col_name,
             CREATE TABLE {output_table} AS
             SELECT
                 {id_col_name} AS {id_col_name},
-                {pred_query} as prediction
+                {pred_query} as prediction,
+                {pred_dist} AS decision_function
             FROM
                 {model_table},
                 {new_data_table}
@@ -993,6 +1351,7 @@ def svm_predict(schema_madlib, model_table, new_data_table, id_col_name,
                 not {schema_madlib}.array_contains_null({independent_varname})
             """.format(**locals())
         plpy.execute(sql)
+# -----------------------------------------------------------------------------
 
 
 def _svc_or_svr(is_svc, source_table, dependent_varname):
@@ -1016,28 +1375,25 @@ def _svc_or_svr(is_svc, source_table, dependent_varname):
                            if isinstance(d['y'], basestring)
                            else str(d['y']) for d in dep_labels]
 
-        _assert(len(dep_var_mapping) == 2,
+        _assert(1 <= len(dep_var_mapping) <= 2,
                 "SVM Error: Classification currently "
-                "only supports binary output!")
+                "only supports unary or binary output!. Found values {0}".
+                format(dep_var_mapping))
 
-        col_dep_var_trans = (
-            """
+        col_dep_var_trans = ("""
             CASE WHEN ({col_dep_var}) IS NULL THEN NULL
                 WHEN ({col_dep_var}) = {mapped_value_for_negative} THEN -1.0
                 ELSE 1.0
             END
-            """
-            .format(col_dep_var=dependent_varname,
-                    mapped_value_for_negative=dep_var_mapping[0])
-            )
-
+            """.format(col_dep_var=dependent_varname,
+                       mapped_value_for_negative=dep_var_mapping[0]))
         _args.update({
             'mapped_value_for_negative': dep_var_mapping[0],
             'col_dep_var_trans': col_dep_var_trans,
             'mapping': dep_var_mapping[0] + "," + dep_var_mapping[1],
             'method': 'SVC'})
-
     return _args
+# -----------------------------------------------------------------------------
 
 
 def _process_epsilon(is_svc, args):
@@ -1101,6 +1457,35 @@ def _process_epsilon(is_svc, args):
             'epsilon': epsilon,
             'rel_epsilon': rel_epsilon,
             'as_rel_source': as_rel_source}
+# -----------------------------------------------------------------------------
+
+
+def _extract_kernel_params(kernel_params='', n_features=10):
+    params_default = {
+        # common params
+        'n_components': 2 * n_features,
+        'fit_intercept': False,
+        'random_state': 1,
+
+        # polynomial params
+        'degree': 3,
+        'coef0': 1,
+
+        # gaussian params
+        'fit_in_memory': True,
+        'gamma': 1 / n_features,
+    }
+    params_types = {
+        'n_components': int,
+        'fit_intercept': bool,
+        'random_state': int,
+        'degree': int,
+        'coef0': float,
+        'fit_in_memory': bool,
+        'gamma': float,
+    }
+    return extract_keyvalue_params(kernel_params, params_types, params_default)
+# -----------------------------------------------------------------------------
 
 
 def _extract_params(schema_madlib, params, module='SVM'):
@@ -1198,6 +1583,5 @@ class SVMTestCase(unittest.TestCase):
                          ['max_iter=10', 'optimizer="irls"', 'precision=0.02', 'lambda={1,2,3,4}'])
 
 
-
 if __name__ == '__main__':
     unittest.main()