You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohit Singh <mo...@gmail.com> on 2014/09/11 07:43:25 UTC

Setting up jvm in pyspark from shell

Hi,
  I am using pyspark shell and am trying to create an rdd from numpy matrix
rdd = sc.parallelize(matrix)
I am getting the following error:
JVMDUMP039I Processing dump event "systhrow", detail
"java/lang/OutOfMemoryError" at 2014/09/10 22:41:44 - please wait.
JVMDUMP032I JVM requested Heap dump using
'/global/u2/m/msingh/heapdump.20140910.224144.29660.0005.phd' in response
to an event
JVMDUMP010I Heap dump written to
/global/u2/m/msingh/heapdump.20140910.224144.29660.0005.phd
JVMDUMP032I JVM requested Java dump using
'/global/u2/m/msingh/javacore.20140910.224144.29660.0006.txt' in response
to an event
JVMDUMP010I Java dump written to
/global/u2/m/msingh/javacore.20140910.224144.29660.0006.txt
JVMDUMP032I JVM requested Snap dump using
'/global/u2/m/msingh/Snap.20140910.224144.29660.0007.trc' in response to an
event
JVMDUMP010I Snap dump written to
/global/u2/m/msingh/Snap.20140910.224144.29660.0007.trc
JVMDUMP013I Processed dump event "systhrow", detail
"java/lang/OutOfMemoryError".
Exception AttributeError: "'SparkContext' object has no attribute '_jsc'"
in <bound method SparkContext.__del__ of <pyspark.context.SparkContext
object at 0x11f9450>> ignored
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line 271,
in parallelize
    jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
  File
"/usr/common/usg/spark/1.0.2/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
line 537, in __call__
  File
"/usr/common/usg/spark/1.0.2/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
: java.lang.OutOfMemoryError: Java heap space
at
org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:618)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:804)

I did try to setSystemProperty....
sc.setSystemProperty("spark.executor.memory", "20g")
How do i increase jvm heap from the shell?

-- 
Mohit

"When you want success as badly as you want the air, then you will get it.
There is no other secret of success."
-Socrates

Re: Setting up jvm in pyspark from shell

Posted by Davies Liu <da...@databricks.com>.
The heap size of JVM can not been changed dynamically, so you
 need to config it before running pyspark.

If you run it in local mode, you should config spark.driver.memory
 (in 1.1 or master).

Or, you can use --driver-memory 2G (should work in 1.0+)

On Wed, Sep 10, 2014 at 10:43 PM, Mohit Singh <mo...@gmail.com> wrote:
> Hi,
>   I am using pyspark shell and am trying to create an rdd from numpy matrix
> rdd = sc.parallelize(matrix)
> I am getting the following error:
> JVMDUMP039I Processing dump event "systhrow", detail
> "java/lang/OutOfMemoryError" at 2014/09/10 22:41:44 - please wait.
> JVMDUMP032I JVM requested Heap dump using
> '/global/u2/m/msingh/heapdump.20140910.224144.29660.0005.phd' in response to
> an event
> JVMDUMP010I Heap dump written to
> /global/u2/m/msingh/heapdump.20140910.224144.29660.0005.phd
> JVMDUMP032I JVM requested Java dump using
> '/global/u2/m/msingh/javacore.20140910.224144.29660.0006.txt' in response to
> an event
> JVMDUMP010I Java dump written to
> /global/u2/m/msingh/javacore.20140910.224144.29660.0006.txt
> JVMDUMP032I JVM requested Snap dump using
> '/global/u2/m/msingh/Snap.20140910.224144.29660.0007.trc' in response to an
> event
> JVMDUMP010I Snap dump written to
> /global/u2/m/msingh/Snap.20140910.224144.29660.0007.trc
> JVMDUMP013I Processed dump event "systhrow", detail
> "java/lang/OutOfMemoryError".
> Exception AttributeError: "'SparkContext' object has no attribute '_jsc'" in
> <bound method SparkContext.__del__ of <pyspark.context.SparkContext object
> at 0x11f9450>> ignored
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/common/usg/spark/1.0.2/python/pyspark/context.py", line 271, in
> parallelize
>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>   File
> "/usr/common/usg/spark/1.0.2/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 537, in __call__
>   File
> "/usr/common/usg/spark/1.0.2/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
> : java.lang.OutOfMemoryError: Java heap space
> at
> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
> at org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
> at java.lang.reflect.Method.invoke(Method.java:618)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:207)
> at java.lang.Thread.run(Thread.java:804)
>
> I did try to setSystemProperty....
> sc.setSystemProperty("spark.executor.memory", "20g")
> How do i increase jvm heap from the shell?
>
> --
> Mohit
>
> "When you want success as badly as you want the air, then you will get it.
> There is no other secret of success."
> -Socrates

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org