You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by "Ulanov, Alexander" <al...@hp.com> on 2014/08/26 12:53:10 UTC

Gradient descent and runMiniBatchSGD

Hi,

I've implemented back propagation algorithm using Gradient class and a simple update using Updater class. Then I run the algorithm with mllib's GradientDescent class. I have troubles in scaling out this implementation. I thought that if I partition my data into the number of workers then performance will increase, because each worker will run a step of gradient descent on its partition of data. But this does not happen and each worker seems to process all data (if miniBatchFraction == 1.0 as in mllib's logisic regression implementation). For me, this doesn't make sense, because then only single Worker will provide the same performance. Could someone elaborate on this and correct me if I am wrong. How can I scale out the algorithm with many Workers?

Best regards, Alexander

Re: Gradient descent and runMiniBatchSGD

Posted by "Ulanov, Alexander" <al...@hp.com>.

Hi Xiangrui,

Thanks for explanation, but I'm still missing something. In my experiments, if miniBatchFraction == 1.0, no matter how the data is partitioned (2, 4, 8, 16 partitions), the algorithm executes more or less in the same time. (I have 16 Workers). Reduce from runMiniBatchSGD takes most of the time for 2 partitions, mapPartitionWithIndex -- for 16. What I would expect is that the time reduces proportional to the number of data partitions because each partition will be processed on separate Worker hopefully. Why the time does not reduce?

Btw processing of one instance in my algorithm is a heavy computation, this is exact reason why I want to parallelize it.

Best regards, Alexander

26.08.2014, в 20:54, "Xiangrui Meng" <me...@gmail.com>> написал(а):

miniBatchFraction uses RDD.sample to get the mini-batch, and sample
still needs to visit the elements one after another. So it is not
efficient if the task is not computation heavy and this is why
setMiniBatchFraction is marked as experimental. If we can detect that
the partition iterator is backed by an ArrayBuffer, maybe we can do a
skip iterator to skip elements. -Xiangrui

On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
<al...@hp.com>> wrote:
Hi, RJ

https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala

Unit tests are in the same branch.

Alexander

From: RJ Nowling [mailto:rnowling@gmail.com]
Sent: Tuesday, August 26, 2014 6:59 PM
To: Ulanov, Alexander
Cc: dev@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Gradient descent and runMiniBatchSGD

Hi Alexander,

Can you post a link to the code?

RJ

On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander <al...@hp.com>> wrote:
Hi,

I've implemented back propagation algorithm using Gradient class and a simple update using Updater class. Then I run the algorithm with mllib's GradientDescent class. I have troubles in scaling out this implementation. I thought that if I partition my data into the number of workers then performance will increase, because each worker will run a step of gradient descent on its partition of data. But this does not happen and each worker seems to process all data (if miniBatchFraction == 1.0 as in mllib's logisic regression implementation). For me, this doesn't make sense, because then only single Worker will provide the same performance. Could someone elaborate on this and correct me if I am wrong. How can I scale out the algorithm with many Workers?

Best regards, Alexander



--
em rnowling@gmail.com<ma...@gmail.com>
c 954.496.2314

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Gradient descent and runMiniBatchSGD

Posted by RJ Nowling <rn...@gmail.com>.

Also, another idea: may algorithms that use sampling tend to do so multiple
times.  It may be beneficial to allow a transformation to a representation
that is more efficient for multiple rounds of sampling.


On Tue, Aug 26, 2014 at 4:36 PM, RJ Nowling <rn...@gmail.com> wrote:

> Xiangrui,
>
> I posted a note on my JIRA for MiniBatch KMeans about the same problem --
> sampling running in O(n).
>
> Can you elaborate on ways to get more efficient sampling?  I think this
> will be important for a variety of stochastic algorithms.
>
> RJ
>
>
> On Tue, Aug 26, 2014 at 12:54 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> miniBatchFraction uses RDD.sample to get the mini-batch, and sample
>> still needs to visit the elements one after another. So it is not
>> efficient if the task is not computation heavy and this is why
>> setMiniBatchFraction is marked as experimental. If we can detect that
>> the partition iterator is backed by an ArrayBuffer, maybe we can do a
>> skip iterator to skip elements. -Xiangrui
>>
>> On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
>> <al...@hp.com> wrote:
>> > Hi, RJ
>> >
>> >
>> https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala
>> >
>> > Unit tests are in the same branch.
>> >
>> > Alexander
>> >
>> > From: RJ Nowling [mailto:rnowling@gmail.com]
>> > Sent: Tuesday, August 26, 2014 6:59 PM
>> > To: Ulanov, Alexander
>> > Cc: dev@spark.apache.org
>> > Subject: Re: Gradient descent and runMiniBatchSGD
>> >
>> > Hi Alexander,
>> >
>> > Can you post a link to the code?
>> >
>> > RJ
>> >
>> > On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander <
>> alexander.ulanov@hp.com<ma...@hp.com>> wrote:
>> > Hi,
>> >
>> > I've implemented back propagation algorithm using Gradient class and a
>> simple update using Updater class. Then I run the algorithm with mllib's
>> GradientDescent class. I have troubles in scaling out this implementation.
>> I thought that if I partition my data into the number of workers then
>> performance will increase, because each worker will run a step of gradient
>> descent on its partition of data. But this does not happen and each worker
>> seems to process all data (if miniBatchFraction == 1.0 as in mllib's
>> logisic regression implementation). For me, this doesn't make sense,
>> because then only single Worker will provide the same performance. Could
>> someone elaborate on this and correct me if I am wrong. How can I scale out
>> the algorithm with many Workers?
>> >
>> > Best regards, Alexander
>> >
>> >
>> >
>> > --
>> > em rnowling@gmail.com<ma...@gmail.com>
>> > c 954.496.2314
>>
>
>
>
> --
> em rnowling@gmail.com
> c 954.496.2314
>



-- 
em rnowling@gmail.com
c 954.496.2314

Re: Gradient descent and runMiniBatchSGD

Posted by RJ Nowling <rn...@gmail.com>.

Xiangrui,

I posted a note on my JIRA for MiniBatch KMeans about the same problem --
sampling running in O(n).

Can you elaborate on ways to get more efficient sampling?  I think this
will be important for a variety of stochastic algorithms.

RJ


On Tue, Aug 26, 2014 at 12:54 PM, Xiangrui Meng <me...@gmail.com> wrote:

> miniBatchFraction uses RDD.sample to get the mini-batch, and sample
> still needs to visit the elements one after another. So it is not
> efficient if the task is not computation heavy and this is why
> setMiniBatchFraction is marked as experimental. If we can detect that
> the partition iterator is backed by an ArrayBuffer, maybe we can do a
> skip iterator to skip elements. -Xiangrui
>
> On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
> <al...@hp.com> wrote:
> > Hi, RJ
> >
> >
> https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala
> >
> > Unit tests are in the same branch.
> >
> > Alexander
> >
> > From: RJ Nowling [mailto:rnowling@gmail.com]
> > Sent: Tuesday, August 26, 2014 6:59 PM
> > To: Ulanov, Alexander
> > Cc: dev@spark.apache.org
> > Subject: Re: Gradient descent and runMiniBatchSGD
> >
> > Hi Alexander,
> >
> > Can you post a link to the code?
> >
> > RJ
> >
> > On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander <
> alexander.ulanov@hp.com<ma...@hp.com>> wrote:
> > Hi,
> >
> > I've implemented back propagation algorithm using Gradient class and a
> simple update using Updater class. Then I run the algorithm with mllib's
> GradientDescent class. I have troubles in scaling out this implementation.
> I thought that if I partition my data into the number of workers then
> performance will increase, because each worker will run a step of gradient
> descent on its partition of data. But this does not happen and each worker
> seems to process all data (if miniBatchFraction == 1.0 as in mllib's
> logisic regression implementation). For me, this doesn't make sense,
> because then only single Worker will provide the same performance. Could
> someone elaborate on this and correct me if I am wrong. How can I scale out
> the algorithm with many Workers?
> >
> > Best regards, Alexander
> >
> >
> >
> > --
> > em rnowling@gmail.com<ma...@gmail.com>
> > c 954.496.2314
>



-- 
em rnowling@gmail.com
c 954.496.2314

Re: Gradient descent and runMiniBatchSGD

Posted by Xiangrui Meng <me...@gmail.com>.

miniBatchFraction uses RDD.sample to get the mini-batch, and sample
still needs to visit the elements one after another. So it is not
efficient if the task is not computation heavy and this is why
setMiniBatchFraction is marked as experimental. If we can detect that
the partition iterator is backed by an ArrayBuffer, maybe we can do a
skip iterator to skip elements. -Xiangrui

On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
<al...@hp.com> wrote:
> Hi, RJ
>
> https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala
>
> Unit tests are in the same branch.
>
> Alexander
>
> From: RJ Nowling [mailto:rnowling@gmail.com]
> Sent: Tuesday, August 26, 2014 6:59 PM
> To: Ulanov, Alexander
> Cc: dev@spark.apache.org
> Subject: Re: Gradient descent and runMiniBatchSGD
>
> Hi Alexander,
>
> Can you post a link to the code?
>
> RJ
>
> On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander <al...@hp.com>> wrote:
> Hi,
>
> I've implemented back propagation algorithm using Gradient class and a simple update using Updater class. Then I run the algorithm with mllib's GradientDescent class. I have troubles in scaling out this implementation. I thought that if I partition my data into the number of workers then performance will increase, because each worker will run a step of gradient descent on its partition of data. But this does not happen and each worker seems to process all data (if miniBatchFraction == 1.0 as in mllib's logisic regression implementation). For me, this doesn't make sense, because then only single Worker will provide the same performance. Could someone elaborate on this and correct me if I am wrong. How can I scale out the algorithm with many Workers?
>
> Best regards, Alexander
>
>
>
> --
> em rnowling@gmail.com<ma...@gmail.com>
> c 954.496.2314

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

RE: Gradient descent and runMiniBatchSGD

Posted by "Ulanov, Alexander" <al...@hp.com>.

Hi, RJ

https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala

Unit tests are in the same branch.

Alexander

From: RJ Nowling [mailto:rnowling@gmail.com]
Sent: Tuesday, August 26, 2014 6:59 PM
To: Ulanov, Alexander
Cc: dev@spark.apache.org
Subject: Re: Gradient descent and runMiniBatchSGD

Hi Alexander,

Can you post a link to the code?

RJ

On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander <al...@hp.com>> wrote:
Hi,

I've implemented back propagation algorithm using Gradient class and a simple update using Updater class. Then I run the algorithm with mllib's GradientDescent class. I have troubles in scaling out this implementation. I thought that if I partition my data into the number of workers then performance will increase, because each worker will run a step of gradient descent on its partition of data. But this does not happen and each worker seems to process all data (if miniBatchFraction == 1.0 as in mllib's logisic regression implementation). For me, this doesn't make sense, because then only single Worker will provide the same performance. Could someone elaborate on this and correct me if I am wrong. How can I scale out the algorithm with many Workers?

Best regards, Alexander



--
em rnowling@gmail.com<ma...@gmail.com>
c 954.496.2314

Re: Gradient descent and runMiniBatchSGD

Posted by RJ Nowling <rn...@gmail.com>.

Hi Alexander,

Can you post a link to the code?

RJ


On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander <al...@hp.com>
wrote:

> Hi,
>
> I've implemented back propagation algorithm using Gradient class and a
> simple update using Updater class. Then I run the algorithm with mllib's
> GradientDescent class. I have troubles in scaling out this implementation.
> I thought that if I partition my data into the number of workers then
> performance will increase, because each worker will run a step of gradient
> descent on its partition of data. But this does not happen and each worker
> seems to process all data (if miniBatchFraction == 1.0 as in mllib's
> logisic regression implementation). For me, this doesn't make sense,
> because then only single Worker will provide the same performance. Could
> someone elaborate on this and correct me if I am wrong. How can I scale out
> the algorithm with many Workers?
>
> Best regards, Alexander
>



-- 
em rnowling@gmail.com
c 954.496.2314