You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Debasish Das <de...@gmail.com> on 2017/12/13 08:20:46 UTC

Hinge Gradient

Hi,

I looked into the LinearSVC flow and found the gradient for hinge as
follows:

Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x)))
Therefore the gradient is -(2y - 1)*x

max is a non-smooth function.

Did we try using ReLu/Softmax function and use that to smooth the hinge
loss ?

Loss function will change to SoftMax(0, 1 - (2y-1) (f_w(x)))

Since this function is smooth, gradient will be well defined and
LBFGS/OWLQN should behave well.

Please let me know if this has been tried already. If not I can run some
benchmarks.

We have soft-max in multinomial regression and can be reused for LinearSVC
flow.

Thanks.
Deb

Re: Hinge Gradient

Posted by Debasish Das <de...@gmail.com>.

If you can point me to previous benchmarks that are done, I would like to
use smoothing and see if the LBFGS convergence improved while not impacting
linear svc loss.

Thanks.
Deb
On Dec 16, 2017 7:48 PM, "Debasish Das" <de...@gmail.com> wrote:

Hi Weichen,

Traditionally svm are solved using quadratic programming solvers and most
likely that's why this idea is not so popular but since in mllib we are
using smooth methods to optimize linear svm, the idea of smoothing svm loss
become relevant.

The paper also mentions kernel svm using the same idea. In place of full
kernel, we can use random kitchen sink.

http://research.cs.wisc.edu/dmi/svm/ssvm/

I will go through Yuhao's work as well.

Thanks.
Deb

On Dec 16, 2017 6:35 PM, "Weichen Xu" <we...@databricks.com> wrote:

Hi Deb,

Which library or paper do you find to use this loss function in SVM ?

But I prefer the implementation in LIBLINEAR which use coordinate descent
optimizer.

Thanks.

On Sun, Dec 17, 2017 at 6:52 AM, Yanbo Liang <yb...@gmail.com> wrote:

> Hello Deb,
>
> To optimize non-smooth function on LBFGS really should be considered
> carefully.
> Is there any literature that proves changing max to soft-max can behave
> well?
> I’m more than happy to see some benchmarks if you can have.
>
> + Yuhao, who did similar effort in this PR: https://github.com/apache/
> spark/pull/17862
>
> Regards
> Yanbo
>
> On Dec 13, 2017, at 12:20 AM, Debasish Das <de...@gmail.com>
> wrote:
>
> Hi,
>
> I looked into the LinearSVC flow and found the gradient for hinge as
> follows:
>
> Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x)))
> Therefore the gradient is -(2y - 1)*x
>
> max is a non-smooth function.
>
> Did we try using ReLu/Softmax function and use that to smooth the hinge
> loss ?
>
> Loss function will change to SoftMax(0, 1 - (2y-1) (f_w(x)))
>
> Since this function is smooth, gradient will be well defined and
> LBFGS/OWLQN should behave well.
>
> Please let me know if this has been tried already. If not I can run some
> benchmarks.
>
> We have soft-max in multinomial regression and can be reused for LinearSVC
> flow.
>
> Thanks.
> Deb
>
>
>

Re: Hinge Gradient

Posted by Debasish Das <de...@gmail.com>.

Hi Weichen,

Traditionally svm are solved using quadratic programming solvers and most
likely that's why this idea is not so popular but since in mllib we are
using smooth methods to optimize linear svm, the idea of smoothing svm loss
become relevant.

The paper also mentions kernel svm using the same idea. In place of full
kernel, we can use random kitchen sink.

http://research.cs.wisc.edu/dmi/svm/ssvm/

I will go through Yuhao's work as well.

Thanks.
Deb

On Dec 16, 2017 6:35 PM, "Weichen Xu" <we...@databricks.com> wrote:

Hi Deb,

Which library or paper do you find to use this loss function in SVM ?

But I prefer the implementation in LIBLINEAR which use coordinate descent
optimizer.

Thanks.

On Sun, Dec 17, 2017 at 6:52 AM, Yanbo Liang <yb...@gmail.com> wrote:

> Hello Deb,
>
> To optimize non-smooth function on LBFGS really should be considered
> carefully.
> Is there any literature that proves changing max to soft-max can behave
> well?
> I’m more than happy to see some benchmarks if you can have.
>
> + Yuhao, who did similar effort in this PR: https://github.com/apache/
> spark/pull/17862
>
> Regards
> Yanbo
>
> On Dec 13, 2017, at 12:20 AM, Debasish Das <de...@gmail.com>
> wrote:
>
> Hi,
>
> I looked into the LinearSVC flow and found the gradient for hinge as
> follows:
>
> Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x)))
> Therefore the gradient is -(2y - 1)*x
>
> max is a non-smooth function.
>
> Did we try using ReLu/Softmax function and use that to smooth the hinge
> loss ?
>
> Loss function will change to SoftMax(0, 1 - (2y-1) (f_w(x)))
>
> Since this function is smooth, gradient will be well defined and
> LBFGS/OWLQN should behave well.
>
> Please let me know if this has been tried already. If not I can run some
> benchmarks.
>
> We have soft-max in multinomial regression and can be reused for LinearSVC
> flow.
>
> Thanks.
> Deb
>
>
>

Re: Hinge Gradient

Posted by Weichen Xu <we...@databricks.com>.

Hi Deb,

Which library or paper do you find to use this loss function in SVM ?

But I prefer the implementation in LIBLINEAR which use coordinate descent
optimizer.

Thanks.

On Sun, Dec 17, 2017 at 6:52 AM, Yanbo Liang <yb...@gmail.com> wrote:

> Hello Deb,
>
> To optimize non-smooth function on LBFGS really should be considered
> carefully.
> Is there any literature that proves changing max to soft-max can behave
> well?
> I’m more than happy to see some benchmarks if you can have.
>
> + Yuhao, who did similar effort in this PR: https://github.com/apache/
> spark/pull/17862
>
> Regards
> Yanbo
>
> On Dec 13, 2017, at 12:20 AM, Debasish Das <de...@gmail.com>
> wrote:
>
> Hi,
>
> I looked into the LinearSVC flow and found the gradient for hinge as
> follows:
>
> Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x)))
> Therefore the gradient is -(2y - 1)*x
>
> max is a non-smooth function.
>
> Did we try using ReLu/Softmax function and use that to smooth the hinge
> loss ?
>
> Loss function will change to SoftMax(0, 1 - (2y-1) (f_w(x)))
>
> Since this function is smooth, gradient will be well defined and
> LBFGS/OWLQN should behave well.
>
> Please let me know if this has been tried already. If not I can run some
> benchmarks.
>
> We have soft-max in multinomial regression and can be reused for LinearSVC
> flow.
>
> Thanks.
> Deb
>
>
>

Re: Hinge Gradient

Posted by Yanbo Liang <yb...@gmail.com>.

Hello Deb,

To optimize non-smooth function on LBFGS really should be considered carefully.
Is there any literature that proves changing max to soft-max can behave well?
I’m more than happy to see some benchmarks if you can have.

+ Yuhao, who did similar effort in this PR: https://github.com/apache/spark/pull/17862 <https://github.com/apache/spark/pull/17862>

Regards
Yanbo   

> On Dec 13, 2017, at 12:20 AM, Debasish Das <de...@gmail.com> wrote:
> 
> Hi,
> 
> I looked into the LinearSVC flow and found the gradient for hinge as follows:
> 
> Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x)))
> Therefore the gradient is -(2y - 1)*x
> 
> max is a non-smooth function.
> 
> Did we try using ReLu/Softmax function and use that to smooth the hinge loss ?
> 
> Loss function will change to SoftMax(0, 1 - (2y-1) (f_w(x)))
> 
> Since this function is smooth, gradient will be well defined and LBFGS/OWLQN should behave well. 
> 
> Please let me know if this has been tried already. If not I can run some benchmarks.
> 
> We have soft-max in multinomial regression and can be reused for LinearSVC flow.
> 
> Thanks.
> Deb