You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by sim <si...@swoop.com> on 2015/07/02 21:40:47 UTC

1.4.0 regression: out-of-memory errors on small data

A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and
fails with a series of out-of-memory errors in 1.4.0. 

This gist <https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4>  
includes the code and the full output from the 1.3.1 and 1.4.0 runs,
including the command line showing how spark-shell is started.

Should the 1.4.0 spark-shell be started with different options to avoid this
problem?

Thanks,
Sim




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Yin Huai <yh...@databricks.com>.

You meant "SPARK_REPL_OPTS"? I did a quick search. Looks like it has been
removed since 1.0. I think it did not affect the behavior of the shell.

On Mon, Jul 6, 2015 at 9:04 AM, Simeon Simeonov <si...@swoop.com> wrote:

>   Yin, that did the trick.
>
>  I'm curious what was the effect of the environment variable, however, as
> the behavior of the shell changed from hanging to quitting when the env var
> value got to 1g.
>
>  /Sim
>
>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
> @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746
>
>
>   From: Yin Huai <yh...@databricks.com>
> Date: Monday, July 6, 2015 at 11:41 AM
> To: Denny Lee <de...@gmail.com>
> Cc: Simeon Simeonov <si...@swoop.com>, Andy Huang <an...@servian.com.au>,
> user <us...@spark.apache.org>
>
> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>
>   Hi Sim,
>
>  I think the right way to set the PermGen Size is through driver extra
> JVM options, i.e.
>
>  --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m"
>
>  Can you try it? Without this conf, your driver's PermGen size is still
> 128m.
>
>  Thanks,
>
>  Yin
>
> On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee <de...@gmail.com> wrote:
>
>>  I went ahead and tested your file and the results from the tests can be
>> seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.
>>
>>  Basically, when running {Java 7, MaxPermSize = 256} or {Java 8,
>> default} the query ran without any issues.  I was able to recreate the
>> issue with {Java 7, default}.  I included the commands I used to start the
>> spark-shell but basically I just used all defaults (no alteration to driver
>> or executor memory) with the only additional call was with
>> driver-class-path to connect to MySQL Hive metastore.  This is on OSX
>> Macbook Pro.
>>
>>  One thing I did notice is that your version of Java 7 is version 51
>> while my version of Java 7 version 79.  Could you see if updating to Java 7
>> version 79 perhaps allows you to use the MaxPermSize call?
>>
>>
>>
>>
>>  On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov <si...@swoop.com> wrote:
>>
>>>  The file is at
>>> https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-00000.gz?dl=1
>>>
>>>  The command was included in the gist
>>>
>>>  SPARK_REPL_OPTS="-XX:MaxPermSize=256m"
>>> spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
>>> com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g
>>>
>>>  /Sim
>>>
>>>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
>>> @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746
>>>
>>>
>>>   From: Yin Huai <yh...@databricks.com>
>>> Date: Monday, July 6, 2015 at 12:59 AM
>>> To: Simeon Simeonov <si...@swoop.com>
>>> Cc: Denny Lee <de...@gmail.com>, Andy Huang <
>>> andy.huang@servian.com.au>, user <us...@spark.apache.org>
>>>
>>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>>>
>>>   I have never seen issue like this. Setting PermGen size to 256m
>>> should solve the problem. Can you send me your test file and the command
>>> used to launch the spark shell or your application?
>>>
>>>  Thanks,
>>>
>>>  Yin
>>>
>>> On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov <si...@swoop.com> wrote:
>>>
>>>>   Yin,
>>>>
>>>>  With 512Mb PermGen, the process still hung and had to be kill -9ed.
>>>>
>>>>  At 1Gb the spark shell & associated processes stopped hanging and
>>>> started exiting with
>>>>
>>>>  scala> println(dfCount.first.getLong(0))
>>>> 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040)
>>>> called with curMem=0, maxMem=2223023063
>>>> 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
>>>> values in memory (estimated size 229.5 KB, free 2.1 GB)
>>>> 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184)
>>>> called with curMem=235040, maxMem=2223023063
>>>> 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
>>>> stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
>>>> 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added
>>>> broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1
>>>> GB)
>>>> 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from
>>>> first at <console>:30
>>>> java.lang.OutOfMemoryError: PermGen space
>>>> Stopping spark context.
>>>> Exception in thread "main"
>>>> Exception: java.lang.OutOfMemoryError thrown from the
>>>> UncaughtExceptionHandler in thread "main"
>>>> 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
>>>> broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
>>>> GB)
>>>>
>>>>  That did not change up until 4Gb of PermGen space and 8Gb for driver
>>>> & executor each.
>>>>
>>>>  I stopped at this point because the exercise started looking silly.
>>>> It is clear that 1.4.0 is using memory in a substantially different manner.
>>>>
>>>>  I'd be happy to share the test file so you can reproduce this in your
>>>> own environment.
>>>>
>>>>  /Sim
>>>>
>>>>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
>>>> @simeons <http://twitter.com/simeons> | blog.simeonov.com |
>>>> 617.299.6746
>>>>
>>>>
>>>>   From: Yin Huai <yh...@databricks.com>
>>>> Date: Sunday, July 5, 2015 at 11:04 PM
>>>> To: Denny Lee <de...@gmail.com>
>>>> Cc: Andy Huang <an...@servian.com.au>, Simeon Simeonov <
>>>> sim@swoop.com>, user <us...@spark.apache.org>
>>>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>>>>
>>>>   Sim,
>>>>
>>>>  Can you increase the PermGen size? Please let me know what is your
>>>> setting when the problem disappears.
>>>>
>>>>  Thanks,
>>>>
>>>>  Yin
>>>>
>>>> On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com>
>>>> wrote:
>>>>
>>>>>  I had run into the same problem where everything was working
>>>>> swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by
>>>>> upgrading to Java8 (from Java7) or by knocking up the PermGenSize had
>>>>> solved my issue.  HTH!
>>>>>
>>>>>
>>>>>
>>>>>  On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>
>>>>> wrote:
>>>>>
>>>>>> We have hit the same issue in spark shell when registering a temp
>>>>>> table. We observed it happening with those who had JDK 6. The problem went
>>>>>> away after installing jdk 8. This was only for the tutorial materials which
>>>>>> was about loading a parquet file.
>>>>>>
>>>>>>  Regards
>>>>>> Andy
>>>>>>
>>>>>> On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com> wrote:
>>>>>>
>>>>>>> @bipin, in my case the error happens immediately in a fresh shell in
>>>>>>> 1.4.0.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
>>>>>>>  Sent from the Apache Spark User List mailing list archive at
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>  --
>>>>>>  Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376
>>>>>> 0700 | f: 02 9376 0730| m: 0433221979
>>>>>>
>>>>>
>>>>
>>>
>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Simeon Simeonov <si...@swoop.com>.

Yin, that did the trick.

I'm curious what was the effect of the environment variable, however, as the behavior of the shell changed from hanging to quitting when the env var value got to 1g.

/Sim

Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/>
@simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746


From: Yin Huai <yh...@databricks.com>>
Date: Monday, July 6, 2015 at 11:41 AM
To: Denny Lee <de...@gmail.com>>
Cc: Simeon Simeonov <si...@swoop.com>>, Andy Huang <an...@servian.com.au>>, user <us...@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Hi Sim,

I think the right way to set the PermGen Size is through driver extra JVM options, i.e.

--conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m"

Can you try it? Without this conf, your driver's PermGen size is still 128m.

Thanks,

Yin

On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee <de...@gmail.com>> wrote:
I went ahead and tested your file and the results from the tests can be seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.

Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default} the query ran without any issues.  I was able to recreate the issue with {Java 7, default}.  I included the commands I used to start the spark-shell but basically I just used all defaults (no alteration to driver or executor memory) with the only additional call was with driver-class-path to connect to MySQL Hive metastore.  This is on OSX Macbook Pro.

One thing I did notice is that your version of Java 7 is version 51 while my version of Java 7 version 79.  Could you see if updating to Java 7 version 79 perhaps allows you to use the MaxPermSize call?




On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov <si...@swoop.com>> wrote:
The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-00000.gz?dl=1

The command was included in the gist

SPARK_REPL_OPTS="-XX:MaxPermSize=256m" spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g

/Sim

Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/>
@simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746<tel:617.299.6746>


From: Yin Huai <yh...@databricks.com>>
Date: Monday, July 6, 2015 at 12:59 AM
To: Simeon Simeonov <si...@swoop.com>>
Cc: Denny Lee <de...@gmail.com>>, Andy Huang <an...@servian.com.au>>, user <us...@spark.apache.org>>

Subject: Re: 1.4.0 regression: out-of-memory errors on small data

I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application?

Thanks,

Yin

On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov <si...@swoop.com>> wrote:
Yin,

With 512Mb PermGen, the process still hung and had to be kill -9ed.

At 1Gb the spark shell & associated processes stopped hanging and started exiting with

scala> println(dfCount.first.getLong(0))
15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063
15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063
15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at <console>:30
java.lang.OutOfMemoryError: PermGen space
Stopping spark context.
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB)

That did not change up until 4Gb of PermGen space and 8Gb for driver & executor each.

I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner.

I'd be happy to share the test file so you can reproduce this in your own environment.

/Sim

Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/>
@simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746<tel:617.299.6746>


From: Yin Huai <yh...@databricks.com>>
Date: Sunday, July 5, 2015 at 11:04 PM
To: Denny Lee <de...@gmail.com>>
Cc: Andy Huang <an...@servian.com.au>>, Simeon Simeonov <si...@swoop.com>>, user <us...@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Sim,

Can you increase the PermGen size? Please let me know what is your setting when the problem disappears.

Thanks,

Yin

On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com>> wrote:
I had run into the same problem where everything was working swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.  HTH!



On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>> wrote:
We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file.

Regards
Andy

On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com>> wrote:
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>




--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Yin Huai <yh...@databricks.com>.

Hi Sim,

I think the right way to set the PermGen Size is through driver extra JVM
options, i.e.

--conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m"

Can you try it? Without this conf, your driver's PermGen size is still 128m.

Thanks,

Yin

On Mon, Jul 6, 2015 at 4:07 AM, Denny Lee <de...@gmail.com> wrote:

> I went ahead and tested your file and the results from the tests can be
> seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.
>
> Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default}
> the query ran without any issues.  I was able to recreate the issue with
> {Java 7, default}.  I included the commands I used to start the spark-shell
> but basically I just used all defaults (no alteration to driver or executor
> memory) with the only additional call was with driver-class-path to connect
> to MySQL Hive metastore.  This is on OSX Macbook Pro.
>
> One thing I did notice is that your version of Java 7 is version 51 while
> my version of Java 7 version 79.  Could you see if updating to Java 7
> version 79 perhaps allows you to use the MaxPermSize call?
>
>
>
>
> On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov <si...@swoop.com> wrote:
>
>>  The file is at
>> https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-00000.gz?dl=1
>>
>>  The command was included in the gist
>>
>>  SPARK_REPL_OPTS="-XX:MaxPermSize=256m"
>> spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
>> com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g
>>
>>  /Sim
>>
>>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
>> @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746
>>
>>
>>   From: Yin Huai <yh...@databricks.com>
>> Date: Monday, July 6, 2015 at 12:59 AM
>> To: Simeon Simeonov <si...@swoop.com>
>> Cc: Denny Lee <de...@gmail.com>, Andy Huang <
>> andy.huang@servian.com.au>, user <us...@spark.apache.org>
>>
>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>>
>>   I have never seen issue like this. Setting PermGen size to 256m should
>> solve the problem. Can you send me your test file and the command used to
>> launch the spark shell or your application?
>>
>>  Thanks,
>>
>>  Yin
>>
>> On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov <si...@swoop.com> wrote:
>>
>>>   Yin,
>>>
>>>  With 512Mb PermGen, the process still hung and had to be kill -9ed.
>>>
>>>  At 1Gb the spark shell & associated processes stopped hanging and
>>> started exiting with
>>>
>>>  scala> println(dfCount.first.getLong(0))
>>> 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040)
>>> called with curMem=0, maxMem=2223023063
>>> 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
>>> values in memory (estimated size 229.5 KB, free 2.1 GB)
>>> 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184)
>>> called with curMem=235040, maxMem=2223023063
>>> 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
>>> stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
>>> 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added
>>> broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1
>>> GB)
>>> 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from
>>> first at <console>:30
>>> java.lang.OutOfMemoryError: PermGen space
>>> Stopping spark context.
>>> Exception in thread "main"
>>> Exception: java.lang.OutOfMemoryError thrown from the
>>> UncaughtExceptionHandler in thread "main"
>>> 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
>>> broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
>>> GB)
>>>
>>>  That did not change up until 4Gb of PermGen space and 8Gb for driver &
>>> executor each.
>>>
>>>  I stopped at this point because the exercise started looking silly. It
>>> is clear that 1.4.0 is using memory in a substantially different manner.
>>>
>>>  I'd be happy to share the test file so you can reproduce this in your
>>> own environment.
>>>
>>>  /Sim
>>>
>>>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
>>> @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746
>>>
>>>
>>>   From: Yin Huai <yh...@databricks.com>
>>> Date: Sunday, July 5, 2015 at 11:04 PM
>>> To: Denny Lee <de...@gmail.com>
>>> Cc: Andy Huang <an...@servian.com.au>, Simeon Simeonov <
>>> sim@swoop.com>, user <us...@spark.apache.org>
>>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>>>
>>>   Sim,
>>>
>>>  Can you increase the PermGen size? Please let me know what is your
>>> setting when the problem disappears.
>>>
>>>  Thanks,
>>>
>>>  Yin
>>>
>>> On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com> wrote:
>>>
>>>>  I had run into the same problem where everything was working
>>>> swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by
>>>> upgrading to Java8 (from Java7) or by knocking up the PermGenSize had
>>>> solved my issue.  HTH!
>>>>
>>>>
>>>>
>>>>  On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>
>>>> wrote:
>>>>
>>>>> We have hit the same issue in spark shell when registering a temp
>>>>> table. We observed it happening with those who had JDK 6. The problem went
>>>>> away after installing jdk 8. This was only for the tutorial materials which
>>>>> was about loading a parquet file.
>>>>>
>>>>>  Regards
>>>>> Andy
>>>>>
>>>>> On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com> wrote:
>>>>>
>>>>>> @bipin, in my case the error happens immediately in a fresh shell in
>>>>>> 1.4.0.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
>>>>>>  Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>  --
>>>>>  Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376
>>>>> 0700 | f: 02 9376 0730| m: 0433221979
>>>>>
>>>>
>>>
>>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Denny Lee <de...@gmail.com>.

I went ahead and tested your file and the results from the tests can be
seen in the gist: https://gist.github.com/dennyglee/c933b5ae01c57bd01d94.

Basically, when running {Java 7, MaxPermSize = 256} or {Java 8, default}
the query ran without any issues.  I was able to recreate the issue with
{Java 7, default}.  I included the commands I used to start the spark-shell
but basically I just used all defaults (no alteration to driver or executor
memory) with the only additional call was with driver-class-path to connect
to MySQL Hive metastore.  This is on OSX Macbook Pro.

One thing I did notice is that your version of Java 7 is version 51 while
my version of Java 7 version 79.  Could you see if updating to Java 7
version 79 perhaps allows you to use the MaxPermSize call?




On Mon, Jul 6, 2015 at 1:36 PM Simeon Simeonov <si...@swoop.com> wrote:

>  The file is at
> https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-00000.gz?dl=1
>
>  The command was included in the gist
>
>  SPARK_REPL_OPTS="-XX:MaxPermSize=256m"
> spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages
> com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g
>
>  /Sim
>
>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
> @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746
>
>
>   From: Yin Huai <yh...@databricks.com>
> Date: Monday, July 6, 2015 at 12:59 AM
> To: Simeon Simeonov <si...@swoop.com>
> Cc: Denny Lee <de...@gmail.com>, Andy Huang <
> andy.huang@servian.com.au>, user <us...@spark.apache.org>
>
> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>
>   I have never seen issue like this. Setting PermGen size to 256m should
> solve the problem. Can you send me your test file and the command used to
> launch the spark shell or your application?
>
>  Thanks,
>
>  Yin
>
> On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov <si...@swoop.com> wrote:
>
>>   Yin,
>>
>>  With 512Mb PermGen, the process still hung and had to be kill -9ed.
>>
>>  At 1Gb the spark shell & associated processes stopped hanging and
>> started exiting with
>>
>>  scala> println(dfCount.first.getLong(0))
>> 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040)
>> called with curMem=0, maxMem=2223023063
>> 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
>> values in memory (estimated size 229.5 KB, free 2.1 GB)
>> 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called
>> with curMem=235040, maxMem=2223023063
>> 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
>> stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
>> 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0
>> in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
>> 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first
>> at <console>:30
>> java.lang.OutOfMemoryError: PermGen space
>> Stopping spark context.
>> Exception in thread "main"
>> Exception: java.lang.OutOfMemoryError thrown from the
>> UncaughtExceptionHandler in thread "main"
>> 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
>> broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
>> GB)
>>
>>  That did not change up until 4Gb of PermGen space and 8Gb for driver &
>> executor each.
>>
>>  I stopped at this point because the exercise started looking silly. It
>> is clear that 1.4.0 is using memory in a substantially different manner.
>>
>>  I'd be happy to share the test file so you can reproduce this in your
>> own environment.
>>
>>  /Sim
>>
>>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
>> @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746
>>
>>
>>   From: Yin Huai <yh...@databricks.com>
>> Date: Sunday, July 5, 2015 at 11:04 PM
>> To: Denny Lee <de...@gmail.com>
>> Cc: Andy Huang <an...@servian.com.au>, Simeon Simeonov <
>> sim@swoop.com>, user <us...@spark.apache.org>
>> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>>
>>   Sim,
>>
>>  Can you increase the PermGen size? Please let me know what is your
>> setting when the problem disappears.
>>
>>  Thanks,
>>
>>  Yin
>>
>> On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com> wrote:
>>
>>>  I had run into the same problem where everything was working
>>> swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by
>>> upgrading to Java8 (from Java7) or by knocking up the PermGenSize had
>>> solved my issue.  HTH!
>>>
>>>
>>>
>>>  On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>
>>> wrote:
>>>
>>>> We have hit the same issue in spark shell when registering a temp
>>>> table. We observed it happening with those who had JDK 6. The problem went
>>>> away after installing jdk 8. This was only for the tutorial materials which
>>>> was about loading a parquet file.
>>>>
>>>>  Regards
>>>> Andy
>>>>
>>>> On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com> wrote:
>>>>
>>>>> @bipin, in my case the error happens immediately in a fresh shell in
>>>>> 1.4.0.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
>>>>>  Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>>  --
>>>>  Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376
>>>> 0700 | f: 02 9376 0730| m: 0433221979
>>>>
>>>
>>
>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Simeon Simeonov <si...@swoop.com>.

The file is at https://www.dropbox.com/s/a00sd4x65448dl2/apache-spark-failure-data-part-00000.gz?dl=1

The command was included in the gist

SPARK_REPL_OPTS="-XX:MaxPermSize=256m" spark-1.4.0-bin-hadoop2.6/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3 --driver-memory 4g --executor-memory 4g

/Sim

Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/>
@simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746


From: Yin Huai <yh...@databricks.com>>
Date: Monday, July 6, 2015 at 12:59 AM
To: Simeon Simeonov <si...@swoop.com>>
Cc: Denny Lee <de...@gmail.com>>, Andy Huang <an...@servian.com.au>>, user <us...@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

I have never seen issue like this. Setting PermGen size to 256m should solve the problem. Can you send me your test file and the command used to launch the spark shell or your application?

Thanks,

Yin

On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov <si...@swoop.com>> wrote:
Yin,

With 512Mb PermGen, the process still hung and had to be kill -9ed.

At 1Gb the spark shell & associated processes stopped hanging and started exiting with

scala> println(dfCount.first.getLong(0))
15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063
15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063
15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at <console>:30
java.lang.OutOfMemoryError: PermGen space
Stopping spark context.
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB)

That did not change up until 4Gb of PermGen space and 8Gb for driver & executor each.

I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner.

I'd be happy to share the test file so you can reproduce this in your own environment.

/Sim

Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/>
@simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746<tel:617.299.6746>


From: Yin Huai <yh...@databricks.com>>
Date: Sunday, July 5, 2015 at 11:04 PM
To: Denny Lee <de...@gmail.com>>
Cc: Andy Huang <an...@servian.com.au>>, Simeon Simeonov <si...@swoop.com>>, user <us...@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Sim,

Can you increase the PermGen size? Please let me know what is your setting when the problem disappears.

Thanks,

Yin

On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com>> wrote:
I had run into the same problem where everything was working swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.  HTH!



On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>> wrote:
We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file.

Regards
Andy

On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com>> wrote:
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>




--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Yin Huai <yh...@databricks.com>.

I have never seen issue like this. Setting PermGen size to 256m should
solve the problem. Can you send me your test file and the command used to
launch the spark shell or your application?

Thanks,

Yin

On Sun, Jul 5, 2015 at 9:17 PM, Simeon Simeonov <si...@swoop.com> wrote:

>   Yin,
>
>  With 512Mb PermGen, the process still hung and had to be kill -9ed.
>
>  At 1Gb the spark shell & associated processes stopped hanging and
> started exiting with
>
>  scala> println(dfCount.first.getLong(0))
> 15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called
> with curMem=0, maxMem=2223023063
> 15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as
> values in memory (estimated size 229.5 KB, free 2.1 GB)
> 15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called
> with curMem=235040, maxMem=2223023063
> 15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0
> stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
> 15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0
> in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
> 15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first
> at <console>:30
> java.lang.OutOfMemoryError: PermGen space
> Stopping spark context.
> Exception in thread "main"
> Exception: java.lang.OutOfMemoryError thrown from the
> UncaughtExceptionHandler in thread "main"
> 15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed
> broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1
> GB)
>
>  That did not change up until 4Gb of PermGen space and 8Gb for driver &
> executor each.
>
>  I stopped at this point because the exercise started looking silly. It
> is clear that 1.4.0 is using memory in a substantially different manner.
>
>  I'd be happy to share the test file so you can reproduce this in your
> own environment.
>
>  /Sim
>
>  Simeon Simeonov, Founder & CTO, Swoop <http://swoop.com/>
> @simeons <http://twitter.com/simeons> | blog.simeonov.com | 617.299.6746
>
>
>   From: Yin Huai <yh...@databricks.com>
> Date: Sunday, July 5, 2015 at 11:04 PM
> To: Denny Lee <de...@gmail.com>
> Cc: Andy Huang <an...@servian.com.au>, Simeon Simeonov <si...@swoop.com>,
> user <us...@spark.apache.org>
> Subject: Re: 1.4.0 regression: out-of-memory errors on small data
>
>   Sim,
>
>  Can you increase the PermGen size? Please let me know what is your
> setting when the problem disappears.
>
>  Thanks,
>
>  Yin
>
> On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com> wrote:
>
>>  I had run into the same problem where everything was working swimmingly
>> with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to
>> Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.
>> HTH!
>>
>>
>>
>>  On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>
>> wrote:
>>
>>> We have hit the same issue in spark shell when registering a temp table.
>>> We observed it happening with those who had JDK 6. The problem went away
>>> after installing jdk 8. This was only for the tutorial materials which was
>>> about loading a parquet file.
>>>
>>>  Regards
>>> Andy
>>>
>>> On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com> wrote:
>>>
>>>> @bipin, in my case the error happens immediately in a fresh shell in
>>>> 1.4.0.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
>>>>  Sent from the Apache Spark User List mailing list archive at
>>>> Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>>
>>>  --
>>>  Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
>>> f: 02 9376 0730| m: 0433221979
>>>
>>
>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Simeon Simeonov <si...@swoop.com>.

Yin,

With 512Mb PermGen, the process still hung and had to be kill -9ed.

At 1Gb the spark shell & associated processes stopped hanging and started exiting with

scala> println(dfCount.first.getLong(0))
15/07/06 00:10:07 INFO storage.MemoryStore: ensureFreeSpace(235040) called with curMem=0, maxMem=2223023063
15/07/06 00:10:07 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 229.5 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.MemoryStore: ensureFreeSpace(20184) called with curMem=235040, maxMem=2223023063
15/07/06 00:10:08 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 19.7 KB, free 2.1 GB)
15/07/06 00:10:08 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:65464 (size: 19.7 KB, free: 2.1 GB)
15/07/06 00:10:08 INFO spark.SparkContext: Created broadcast 2 from first at <console>:30
java.lang.OutOfMemoryError: PermGen space
Stopping spark context.
Exception in thread "main"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main"
15/07/06 00:10:14 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on localhost:65464 in memory (size: 19.7 KB, free: 2.1 GB)

That did not change up until 4Gb of PermGen space and 8Gb for driver & executor each.

I stopped at this point because the exercise started looking silly. It is clear that 1.4.0 is using memory in a substantially different manner.

I'd be happy to share the test file so you can reproduce this in your own environment.

/Sim

Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/>
@simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746


From: Yin Huai <yh...@databricks.com>>
Date: Sunday, July 5, 2015 at 11:04 PM
To: Denny Lee <de...@gmail.com>>
Cc: Andy Huang <an...@servian.com.au>>, Simeon Simeonov <si...@swoop.com>>, user <us...@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Sim,

Can you increase the PermGen size? Please let me know what is your setting when the problem disappears.

Thanks,

Yin

On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com>> wrote:
I had run into the same problem where everything was working swimmingly with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.  HTH!



On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>> wrote:
We have hit the same issue in spark shell when registering a temp table. We observed it happening with those who had JDK 6. The problem went away after installing jdk 8. This was only for the tutorial materials which was about loading a parquet file.

Regards
Andy

On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com>> wrote:
@bipin, in my case the error happens immediately in a fresh shell in 1.4.0.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>




--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 | f: 02 9376 0730| m: 0433221979

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Yin Huai <yh...@databricks.com>.

Sim,

Can you increase the PermGen size? Please let me know what is your setting
when the problem disappears.

Thanks,

Yin

On Sun, Jul 5, 2015 at 5:59 PM, Denny Lee <de...@gmail.com> wrote:

> I had run into the same problem where everything was working swimmingly
> with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to
> Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.
> HTH!
>
>
>
> On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au>
> wrote:
>
>> We have hit the same issue in spark shell when registering a temp table.
>> We observed it happening with those who had JDK 6. The problem went away
>> after installing jdk 8. This was only for the tutorial materials which was
>> about loading a parquet file.
>>
>> Regards
>> Andy
>>
>> On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com> wrote:
>>
>>> @bipin, in my case the error happens immediately in a fresh shell in
>>> 1.4.0.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
>> f: 02 9376 0730| m: 0433221979
>>
>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Denny Lee <de...@gmail.com>.

I had run into the same problem where everything was working swimmingly
with Spark 1.3.1.  When I switched to Spark 1.4, either by upgrading to
Java8 (from Java7) or by knocking up the PermGenSize had solved my issue.
HTH!



On Mon, Jul 6, 2015 at 8:31 AM Andy Huang <an...@servian.com.au> wrote:

> We have hit the same issue in spark shell when registering a temp table.
> We observed it happening with those who had JDK 6. The problem went away
> after installing jdk 8. This was only for the tutorial materials which was
> about loading a parquet file.
>
> Regards
> Andy
>
> On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com> wrote:
>
>> @bipin, in my case the error happens immediately in a fresh shell in
>> 1.4.0.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>
>
> --
> Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
> f: 02 9376 0730| m: 0433221979
>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Andy Huang <an...@servian.com.au>.

We have hit the same issue in spark shell when registering a temp table. We
observed it happening with those who had JDK 6. The problem went away after
installing jdk 8. This was only for the tutorial materials which was about
loading a parquet file.

Regards
Andy

On Sat, Jul 4, 2015 at 2:54 AM, sim <si...@swoop.com> wrote:

> @bipin, in my case the error happens immediately in a fresh shell in 1.4.0.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

-- 
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
f: 02 9376 0730| m: 0433221979

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by sim <si...@swoop.com>.

@bipin, in my case the error happens immediately in a fresh shell in 1.4.0.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23614.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by bipin <bi...@gmail.com>.

I have a hunch I want to share: I feel that data is not being deallocated in
memory (at least like in 1.3). Once it goes in-memory it just stays there.

Spark SQL works fine, the same query when run on a new shell won't throw
that error, but when run on a shell which has been used for other queries
before, throws that error.

Also I read on the spark blog, that project Tungsten is making changes in
memory management. And first changes would land in 1.4. Maybe it is related
to that.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23608.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Simeon Simeonov <si...@swoop.com>.

Same error with the new code:

import org.apache.spark.sql.hive.HiveContext

val ctx = sqlContext.asInstanceOf[HiveContext]
import ctx.implicits._

val df = ctx.jsonFile("file:///Users/sim/dev/spx/data/view-clicks-training/2015/06/18/part-00000.gz")
df.registerTempTable("training")

val dfCount = ctx.sql("select count(*) as cnt from training")
println(dfCount.first.getLong(0))

/Sim

Simeon Simeonov, Founder & CTO, Swoop<http://swoop.com/>
@simeons<http://twitter.com/simeons> | blog.simeonov.com<http://blog.simeonov.com/> | 617.299.6746


From: Yin Huai <yh...@databricks.com>>
Date: Thursday, July 2, 2015 at 4:34 PM
To: Simeon Simeonov <si...@swoop.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: 1.4.0 regression: out-of-memory errors on small data

Hi Sim,

Seems you already set the PermGen size to 256m, right? I notice that in your the shell, you created a HiveContext (it further increased the memory consumption on PermGen). But, spark shell has already created a HiveContext for you (sqlContext. You can use asInstanceOf to access HiveContext's methods). Can you just use the sqlContext created by the shell and try again?

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai <yh...@databricks.com>> wrote:
Hi Sim,

Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3 (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you add --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m" in the command you used to launch Spark shell? This will increase the PermGen size from 128m (our default) to 256m.

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:40 PM, sim <si...@swoop.com>> wrote:
A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1 and
fails with a series of out-of-memory errors in 1.4.0.

This gist <https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4>
includes the code and the full output from the 1.3.1 and 1.4.0 runs,
including the command line showing how spark-shell is started.

Should the 1.4.0 spark-shell be started with different options to avoid this
problem?

Thanks,
Sim




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Yin Huai <yh...@databricks.com>.

Hi Sim,

Seems you already set the PermGen size to 256m, right? I notice that in
your the shell, you created a HiveContext (it further increased the memory
consumption on PermGen). But, spark shell has already created a HiveContext
for you (sqlContext. You can use asInstanceOf to access HiveContext's
methods). Can you just use the sqlContext created by the shell and try
again?

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:50 PM, Yin Huai <yh...@databricks.com> wrote:

> Hi Sim,
>
> Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
> (explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you
> add --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m" in the
> command you used to launch Spark shell? This will increase the PermGen size
> from 128m (our default) to 256m.
>
> Thanks,
>
> Yin
>
> On Thu, Jul 2, 2015 at 12:40 PM, sim <si...@swoop.com> wrote:
>
>> A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1
>> and
>> fails with a series of out-of-memory errors in 1.4.0.
>>
>> This gist <https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4>
>> includes the code and the full output from the 1.3.1 and 1.4.0 runs,
>> including the command line showing how spark-shell is started.
>>
>> Should the 1.4.0 spark-shell be started with different options to avoid
>> this
>> problem?
>>
>> Thanks,
>> Sim
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by Yin Huai <yh...@databricks.com>.

Hi Sim,

Spark 1.4.0's memory consumption on PermGen is higher then Spark 1.3
(explained in https://issues.apache.org/jira/browse/SPARK-8776). Can you
add --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=256m" in the
command you used to launch Spark shell? This will increase the PermGen size
from 128m (our default) to 256m.

Thanks,

Yin

On Thu, Jul 2, 2015 at 12:40 PM, sim <si...@swoop.com> wrote:

> A very simple Spark SQL COUNT operation succeeds in spark-shell for 1.3.1
> and
> fails with a series of out-of-memory errors in 1.4.0.
>
> This gist <https://gist.github.com/ssimeonov/a49b75dc086c3ac6f3c4>
> includes the code and the full output from the 1.3.1 and 1.4.0 runs,
> including the command line showing how spark-shell is started.
>
> Should the 1.4.0 spark-shell be started with different options to avoid
> this
> problem?
>
> Thanks,
> Sim
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: 1.4.0 regression: out-of-memory errors on small data

Posted by bipin <bi...@gmail.com>.

I will second this. I very rarely used to get out-of-memory errors in 1.3.
Now I get these errors all the time. I feel that I could work on 1.3
spark-shell for long periods of time without spark throwing that error,
whereas in 1.4 the shell needs to be restarted or gets killed frequently.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/1-4-0-regression-out-of-memory-errors-on-small-data-tp23595p23607.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org