You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Debasish Das <de...@gmail.com> on 2014/03/26 07:06:25 UTC

ALS memory limits

Hi,

For our usecases we are looking into 20 x 1M matrices which comes in the
similar ranges as outlined by the paper over here:

http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html

Is the exponential runtime growth in spark ALS as outlined by the blog
still exists in recommendation.ALS ?

I am running a spark cluster of 10 nodes with total memory of around 1 TB
with 80 cores....

With rank = 50, the memory requirements for ALS should be 20Mx50 doubles on
every worker which is around 8 GB....

Even if both the factor matrices are cached in memory I should be bounded
by ~ 9 GB but even with 32 GB per worker I see GC errors...

I am debugging the scalability and memory requirements of the algorithm
further but any insights will be very helpful...

Also there are two other issues:

1. If GC errors are hit, that worker JVM goes down and I have to restart it
manually. Is this expected ?

2. When I try to make use of all 80 cores on the cluster I get some issues
related to java.io.File not found exception on /tmp/ ? Is there some OS
limit that how many cores can simultaneously access /tmp from a process ?

Thanks.
Deb

On Sun, Mar 16, 2014 at 2:20 PM, Sean Owen <so...@cloudera.com> wrote:

> Good point -- there's been another optimization for ALS in HEAD (
> https://github.com/apache/spark/pull/131), but yes the better place to
> pick up just essential changes since 0.9.0 including the previous one is
> the 0.9 branch.
>
> --
> Sean Owen | Director, Data Science | London
>
>
> On Sun, Mar 16, 2014 at 2:18 PM, Patrick Wendell <pw...@gmail.com>wrote:
>
>> Sean - was this merged into the 0.9 branch as well (it seems so based
>> on the message from rxin). If so it might make sense to try out the
>> head of branch-0.9 as well. Unless there are *also* other changes
>> relevant to this in master.
>>
>> - Patrick
>>
>> On Sun, Mar 16, 2014 at 12:24 PM, Sean Owen <so...@cloudera.com> wrote:
>> > You should simply use a snapshot built from HEAD of
>> github.com/apache/spark
>> > if you can. The key change is in MLlib and with any luck you can just
>> > replace that bit. See the PR I referenced.
>> >
>> > Sure with enough memory you can get it to run even with the memory
>> issue,
>> > but it could be hundreds of GB at your scale. Not sure I take the point
>> > about the JVM; you can give it 64GB of heap and executors can use that
>> much,
>> > sure.
>> >
>> > You could reduce the number of features a lot to work around it too, or
>> > reduce the input size. (If anyone saw my blog post about StackOverflow
>> and
>> > ALS -- that's why I snuck in a relatively paltry 40 features and pruned
>> > questions with <4 tags :) )
>> >
>> > I don't think jblas has anything to do with it per se, and the
>> allocation
>> > fails in Java code, not native code. This should be exactly what that
>> PR I
>> > mentioned fixes.
>> >
>> > --
>> > Sean Owen | Director, Data Science | London
>> >
>> >
>> > On Sun, Mar 16, 2014 at 11:48 AM, Debasish Das <
>> debasish.das83@gmail.com>
>> > wrote:
>> >>
>> >> Thanks Sean...let me get the latest code..do you know which PR was it ?
>> >>
>> >> But will the executors run fine with say 32 gb or 64 gb of memory ?
>> Does
>> >> not JVM shows up issues when the max memory goes beyond certain
>> limit...
>> >>
>> >> Also the failure is due to GC limits from jblas...and I was thinking
>> that
>> >> jblas is going to call native malloc right ? May be 64 gb is not a big
>> deal
>> >> then...I will try increasing to 32 and then 64...
>> >>
>> >> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead
>> limit
>> >> exceeded)
>> >>
>> >>
>> org.jblas.DoubleMatrix.<init>(DoubleMatrix.java:323)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:471)org.jblas.DoubleMatrix.zeros(DoubleMatrix.java:476)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$17.apply(ALSQR.scala:366)scala.Array$.fill(Array.scala:267)com.verizon.bigdata.mllib.recommendation.ALSQR.updateBlock(ALSQR.scala:366)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:346)com.verizon.bigdata.mllib.recommendation.ALSQR$$anonfun$com$verizon$bigdata$mllib$recommendation$ALSQR$$updateFeatures$2.apply(ALSQR.scala:345)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)org.apache.spark.rdd.MappedValuesRDD$$anonfun$compute$1.apply(MappedValuesRDD.scala:32)scala.collection.Iterator$$anon$11.next(Iterator.scala:328)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:149)org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$4.apply(CoGroupedRDD.scala:147)scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:147)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:32)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:242)org.apache.spark.rdd.RDD.iterator(RDD.scala:233)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)org.apache.spark.scheduler.Task.run(Task.scala:53)org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>> >>
>> >>
>> >>
>> >> On Sun, Mar 16, 2014 at 11:42 AM, Sean Owen <so...@cloudera.com>
>> wrote:
>> >>>
>> >>> Are you using HEAD or 0.9.0? I know there was a memory issue fixed a
>> few
>> >>> weeks ago that made ALS need a lot more memory than is needed.
>> >>>
>> >>> https://github.com/apache/incubator-spark/pull/629
>> >>>
>> >>> Try the latest code.
>> >>>
>> >>> --
>> >>> Sean Owen | Director, Data Science | London
>> >>>
>> >>>
>> >>> On Sun, Mar 16, 2014 at 11:40 AM, Debasish Das <
>> debasish.das83@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I gave my spark job 16 gb of memory and it is running on 8 executors.
>> >>>>
>> >>>> The job needs more memory due to ALS requirements (20M x 1M matrix)
>> >>>>
>> >>>> On each node I do have 96 gb of memory and I am using 16 gb out of
>> it. I
>> >>>> want to increase the memory but I am not sure what is the right way
>> to
>> >>>> do
>> >>>> that...
>> >>>>
>> >>>> On 8 executor if I give 96 gb it might be a issue due to GC...
>> >>>>
>> >>>> Ideally on 8 nodes, I would run with 48 executors and each executor
>> will
>> >>>> get 16 gb of memory..Total  48 JVMs...
>> >>>>
>> >>>> Is it possible to increase executors per node ?
>> >>>>
>> >>>> Thanks.
>> >>>> Deb
>> >>>
>> >>>
>> >>
>> >
>>
>
>

Re: ALS memory limits

Posted by Debasish Das <de...@gmail.com>.

Just a clarification, I am using Spark ALS explicit feedback on standalone
cluster without deploying zookeeper master HA option yet...

When in the standalone spark cluster, worker fails due to GC error, the
worker dies as well and I have to restart the worker....Understanding this
issue will be useful as we deploy the solution...


On Wed, Mar 26, 2014 at 7:31 AM, Debasish Das <de...@gmail.com>wrote:

> Thanks Sean. Looking into executor memory options now...
>
> I am at incubator_spark head. Does that has all the fixes or I need spark
> head ? I can deploy the spark head as well...
>
> I am not running implicit feedback yet...I remember memory enhancements
> were mainly for implicit right ?
>
> For ulimit let me look into centos settings....I am curious how map-reduce
> resolves it....by using 1 core from 1 process ? I am running 2 tb yarn jobs
> as well for etl, pre processing etc....have not seen the too many files
> opened yet....
>
> when there is gc error, the worker dies....that's mystery as well...any
> insights from spark core team ? Yarn container gets killed if gc boundaries
> are about to hit....similar ideas can be used here as well ? Also which
> tool do we use for memory debugging in spark ?
>  On Mar 26, 2014 1:45 AM, "Sean Owen" <so...@cloudera.com> wrote:
>
>> Much of this sounds related to the memory issue mentioned earlier in this
>> thread. Are you using a build that has fixed that? That would be by far
>> most important here.
>>
>> If the raw memory requirement is 8GB, the actual heap size necessary could
>> be a lot larger -- object overhead, all the other stuff in memory,
>> overheads within the heap allocation, etc. So I would expect total memory
>> requirement to be significantly more than 9GB.
>>
>> Still, this is the *total* requirement across the cluster. Each worker is
>> just loading part of the matrix. If you have 10 workers I would imagine it
>> roughly chops the per-worker memory requirement by 10x.
>>
>> This in turn depends on also letting workers use more than their default
>> amount of memory. May need to increase executor memory here.
>>
>> Separately, I have observed issues with too many files open and lots of
>> /tmp files. You may have to use ulimit to increase the number of open
>> files
>> allowed.
>>
>> On Wed, Mar 26, 2014 at 6:06 AM, Debasish Das <debasish.das83@gmail.com
>> >wrote:
>>
>> > Hi,
>> >
>> > For our usecases we are looking into 20 x 1M matrices which comes in the
>> > similar ranges as outlined by the paper over here:
>> >
>> >
>> http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html
>> >
>> > Is the exponential runtime growth in spark ALS as outlined by the blog
>> > still exists in recommendation.ALS ?
>> >
>> > I am running a spark cluster of 10 nodes with total memory of around 1
>> TB
>> > with 80 cores....
>> >
>> > With rank = 50, the memory requirements for ALS should be 20Mx50
>> doubles on
>> > every worker which is around 8 GB....
>> >
>> > Even if both the factor matrices are cached in memory I should be
>> bounded
>> > by ~ 9 GB but even with 32 GB per worker I see GC errors...
>> >
>> > I am debugging the scalability and memory requirements of the algorithm
>> > further but any insights will be very helpful...
>> >
>> > Also there are two other issues:
>> >
>> > 1. If GC errors are hit, that worker JVM goes down and I have to
>> restart it
>> > manually. Is this expected ?
>> >
>> > 2. When I try to make use of all 80 cores on the cluster I get some
>> issues
>> > related to java.io.File not found exception on /tmp/ ? Is there some OS
>> > limit that how many cores can simultaneously access /tmp from a process
>> ?
>> >
>> > Thanks.
>> > Deb
>> >
>> >
>>
>

Re: ALS memory limits

Posted by Debasish Das <de...@gmail.com>.

Thanks Sean. Looking into executor memory options now...

I am at incubator_spark head. Does that has all the fixes or I need spark
head ? I can deploy the spark head as well...

I am not running implicit feedback yet...I remember memory enhancements
were mainly for implicit right ?

For ulimit let me look into centos settings....I am curious how map-reduce
resolves it....by using 1 core from 1 process ? I am running 2 tb yarn jobs
as well for etl, pre processing etc....have not seen the too many files
opened yet....

when there is gc error, the worker dies....that's mystery as well...any
insights from spark core team ? Yarn container gets killed if gc boundaries
are about to hit....similar ideas can be used here as well ? Also which
tool do we use for memory debugging in spark ?
 On Mar 26, 2014 1:45 AM, "Sean Owen" <so...@cloudera.com> wrote:

> Much of this sounds related to the memory issue mentioned earlier in this
> thread. Are you using a build that has fixed that? That would be by far
> most important here.
>
> If the raw memory requirement is 8GB, the actual heap size necessary could
> be a lot larger -- object overhead, all the other stuff in memory,
> overheads within the heap allocation, etc. So I would expect total memory
> requirement to be significantly more than 9GB.
>
> Still, this is the *total* requirement across the cluster. Each worker is
> just loading part of the matrix. If you have 10 workers I would imagine it
> roughly chops the per-worker memory requirement by 10x.
>
> This in turn depends on also letting workers use more than their default
> amount of memory. May need to increase executor memory here.
>
> Separately, I have observed issues with too many files open and lots of
> /tmp files. You may have to use ulimit to increase the number of open files
> allowed.
>
> On Wed, Mar 26, 2014 at 6:06 AM, Debasish Das <debasish.das83@gmail.com
> >wrote:
>
> > Hi,
> >
> > For our usecases we are looking into 20 x 1M matrices which comes in the
> > similar ranges as outlined by the paper over here:
> >
> > http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html
> >
> > Is the exponential runtime growth in spark ALS as outlined by the blog
> > still exists in recommendation.ALS ?
> >
> > I am running a spark cluster of 10 nodes with total memory of around 1 TB
> > with 80 cores....
> >
> > With rank = 50, the memory requirements for ALS should be 20Mx50 doubles
> on
> > every worker which is around 8 GB....
> >
> > Even if both the factor matrices are cached in memory I should be bounded
> > by ~ 9 GB but even with 32 GB per worker I see GC errors...
> >
> > I am debugging the scalability and memory requirements of the algorithm
> > further but any insights will be very helpful...
> >
> > Also there are two other issues:
> >
> > 1. If GC errors are hit, that worker JVM goes down and I have to restart
> it
> > manually. Is this expected ?
> >
> > 2. When I try to make use of all 80 cores on the cluster I get some
> issues
> > related to java.io.File not found exception on /tmp/ ? Is there some OS
> > limit that how many cores can simultaneously access /tmp from a process ?
> >
> > Thanks.
> > Deb
> >
> >
>

Re: ALS memory limits

Posted by Sean Owen <so...@cloudera.com>.

Much of this sounds related to the memory issue mentioned earlier in this
thread. Are you using a build that has fixed that? That would be by far
most important here.

If the raw memory requirement is 8GB, the actual heap size necessary could
be a lot larger -- object overhead, all the other stuff in memory,
overheads within the heap allocation, etc. So I would expect total memory
requirement to be significantly more than 9GB.

Still, this is the *total* requirement across the cluster. Each worker is
just loading part of the matrix. If you have 10 workers I would imagine it
roughly chops the per-worker memory requirement by 10x.

This in turn depends on also letting workers use more than their default
amount of memory. May need to increase executor memory here.

Separately, I have observed issues with too many files open and lots of
/tmp files. You may have to use ulimit to increase the number of open files
allowed.

On Wed, Mar 26, 2014 at 6:06 AM, Debasish Das <de...@gmail.com>wrote:

> Hi,
>
> For our usecases we are looking into 20 x 1M matrices which comes in the
> similar ranges as outlined by the paper over here:
>
> http://sandeeptata.blogspot.com/2012/12/sparkler-large-scale-matrix.html
>
> Is the exponential runtime growth in spark ALS as outlined by the blog
> still exists in recommendation.ALS ?
>
> I am running a spark cluster of 10 nodes with total memory of around 1 TB
> with 80 cores....
>
> With rank = 50, the memory requirements for ALS should be 20Mx50 doubles on
> every worker which is around 8 GB....
>
> Even if both the factor matrices are cached in memory I should be bounded
> by ~ 9 GB but even with 32 GB per worker I see GC errors...
>
> I am debugging the scalability and memory requirements of the algorithm
> further but any insights will be very helpful...
>
> Also there are two other issues:
>
> 1. If GC errors are hit, that worker JVM goes down and I have to restart it
> manually. Is this expected ?
>
> 2. When I try to make use of all 80 cores on the cluster I get some issues
> related to java.io.File not found exception on /tmp/ ? Is there some OS
> limit that how many cores can simultaneously access /tmp from a process ?
>
> Thanks.
> Deb
>
>