You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by neil90 <ne...@icloud.com> on 2016/09/07 17:52:07 UTC

Re: Spark Java Heap Error

If your in local mode just allocate all your memory you want to use to your
Driver(that acts as the executor in local mode) don't even bother changing
the executor memory. So your new settings should look like this...

spark.driver.memory              16g 
spark.driver.maxResultSize       2g 
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 

You might need to change your spark.driver.maxResultSize settings if you
plan on doing a collect on the entire rdd/dataframe.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27673.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark Java Heap Error

Posted by Baktaawar <tr...@gmail.com>.

this is the settings I have.

# Example:

# spark.master                     spark://master:7077

# spark.eventLog.enabled           true

# spark.eventLog.dir               hdfs://namenode:8021/directory

# spark.serializer
org.apache.spark.serializer.KryoSerializer

spark.driver.memory              16g

spark.executor.memory            2g

spark.driver.maxResultSize       8g

spark.rdd.compress               false

spark.storage.memoryFraction     0.5


Same problem.
ᐧ

On Tue, Sep 13, 2016 at 10:27 AM, Manish Tripathi <tr...@gmail.com>
wrote:

> Data set is not big. It is 56K X 9K . It does have column names as long
> strings.
>
> It fits very easily in Pandas. That is also in memory thing. So I am not
> sure if memory is an issue here. If Pandas can fit it very easily and work
> on it very fast then Spark shouldnt have problems too right?
> ᐧ
>
> On Tue, Sep 13, 2016 at 10:24 AM, neil90 [via Apache Spark User List] <
> ml-node+s1001560n27707h70@n3.nabble.com> wrote:
>
>> Im assuming the dataset your dealing with is big hence why you wanted to
>> allocate ur full 16gb of Ram to it.
>>
>> I suggest running the python spark-shell as such "pyspark --driver-memory
>> 16g".
>>
>> Also if you cache your data and it doesn't fully fit in memory you can do
>> df.cache(StorageLevel.MEMORY_AND_DISK).
>>
>> ------------------------------
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Ja
>> va-Heap-Error-tp27669p27707.html
>> To unsubscribe from Spark Java Heap Error, click here
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=27669&code=dHIubWFuaXNoQGdtYWlsLmNvbXwyNzY2OXwtNjcyNzMzNjcz>
>> .
>> NAML
>> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27709.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Java Heap Error

Posted by Baktaawar <tr...@gmail.com>.

Data set is not big. It is 56K X 9K . It does have column names as long
strings.

It fits very easily in Pandas. That is also in memory thing. So I am not
sure if memory is an issue here. If Pandas can fit it very easily and work
on it very fast then Spark shouldnt have problems too right?
ᐧ

On Tue, Sep 13, 2016 at 10:24 AM, neil90 [via Apache Spark User List] <
ml-node+s1001560n27707h70@n3.nabble.com> wrote:

> Im assuming the dataset your dealing with is big hence why you wanted to
> allocate ur full 16gb of Ram to it.
>
> I suggest running the python spark-shell as such "pyspark --driver-memory
> 16g".
>
> Also if you cache your data and it doesn't fully fit in memory you can do
> df.cache(StorageLevel.MEMORY_AND_DISK).
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-
> Java-Heap-Error-tp27669p27707.html
> To unsubscribe from Spark Java Heap Error, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=27669&code=dHIubWFuaXNoQGdtYWlsLmNvbXwyNzY2OXwtNjcyNzMzNjcz>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27708.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Java Heap Error

Posted by neil90 <ne...@icloud.com>.

Im assuming the dataset your dealing with is big hence why you wanted to
allocate ur full 16gb of Ram to it.

I suggest running the python spark-shell as such "pyspark --driver-memory
16g".

Also if you cache your data and it doesn't fully fit in memory you can do
df.cache(StorageLevel.MEMORY_AND_DISK).



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27707.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark Java Heap Error

Posted by Baktaawar <tr...@gmail.com>.

I put driver memory as 6gb instead of 8(half of 16). But does 2 gb make
this difference?

On Tuesday, September 13, 2016, neil90 [via Apache Spark User List] <
ml-node+s1001560n27704h77@n3.nabble.com> wrote:

> Double check your Driver Memory in your Spark Web UI make sure the driver
> Memory is close to half of 16gb available.
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-
> Java-Heap-Error-tp27669p27704.html
> To unsubscribe from Spark Java Heap Error, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=27669&code=dHIubWFuaXNoQGdtYWlsLmNvbXwyNzY2OXwtNjcyNzMzNjcz>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27705.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Java Heap Error

Posted by neil90 <ne...@icloud.com>.

Double check your Driver Memory in your Spark Web UI make sure the driver
Memory is close to half of 16gb available.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27704.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark Java Heap Error

Posted by Baktaawar <tr...@gmail.com>.

Hi 

I even tried the dataframe.cache() action to carry out the cross tab
transformation. However still I get the 
same OOM error. 


recommender_ct.cache()
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-15-6d6114a78785> in <module>()
----> 1 recommender_ct.cache()

/Users/i854319/spark/python/pyspark/sql/dataframe.pyc in cache(self)
    375         """
    376         self.is_cached = True
--> 377         self._jdf.cache()
    378         return self
    379 

/Users/i854319/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in
__call__(self, *args)
    811         answer = self.gateway_client.send_command(command)
    812         return_value = get_return_value(
--> 813             answer, self.gateway_client, self.target_id, self.name)
    814 
    815         for temp_arg in temp_args:

/Users/i854319/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)
     43     def deco(*a, **kw):
     44         try:
---> 45             return f(*a, **kw)
     46         except py4j.protocol.Py4JJavaError as e:
     47             s = e.java_exception.toString()

/Users/i854319/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in
get_return_value(answer, gateway_client, target_id, name)
    306                 raise Py4JJavaError(
    307                     "An error occurred while calling {0}{1}{2}.\n".
--> 308                     format(target_id, ".", name), value)
    309             else:
    310                 raise Py4JError(

Py4JJavaError: An error occurred while calling o40.cache.
: java.lang.OutOfMemoryError
	at
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
	at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
	at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
	at
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
	at
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
	at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
	at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
	at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
	at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
	at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
	at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706)
	at
org.apache.spark.sql.execution.ConvertToUnsafe.doExecute(rowFormatConverters.scala:38)
	at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
	at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
	at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
	at
org.apache.spark.sql.execution.columnar.InMemoryRelation.buildBuffers(InMemoryColumnarTableScan.scala:129)
	at
org.apache.spark.sql.execution.columnar.InMemoryRelation.<init>(InMemoryColumnarTableScan.scala:118)
	at
org.apache.spark.sql.execution.columnar.InMemoryRelation$.apply(InMemoryColumnarTableScan.scala:41)
	at
org.apache.spark.sql.execution.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:93)
	at
org.apache.spark.sql.execution.CacheManager.writeLock(CacheManager.scala:60)
	at
org.apache.spark.sql.execution.CacheManager.cacheQuery(CacheManager.scala:84)
	at org.apache.spark.sql.DataFrame.persist(DataFrame.scala:1581)
	at org.apache.spark.sql.DataFrame.cache(DataFrame.scala:1590)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
	at py4j.Gateway.invoke(Gateway.java:259)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:209)
	at java.lang.Thread.run(Thread.java:745)




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27696.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark Java Heap Error

Posted by Baktaawar <tr...@gmail.com>.

Hi Thanks

I tried that. But got this error. Again OOM. I am not sure what to do now.
For spark.driver.maxResultSize i kept 2g. Rest I did as mentioned above.
16Gb for driver and 2g for executor. I have 16Gb mac. Please help. I am
very delayed on my work because of this and not able to move ahead. My
dataset is 56K rows and 8k columns mostly sparse. The column names are
though long strings.

-----------------------------------------------------

Py4JJavaError                             Traceback (most recent call last)

<ipython-input-8-8c71bfcdfd03> in <module>()

----> 1 recommender_ct.show()


/Users/i854319/spark/python/pyspark/sql/dataframe.pyc in show(self, n,
truncate)

    255         +---+-----+

    256         """

--> 257         print(self._jdf.showString(n, truncate))

    258

    259     def __repr__(self):


/Users/i854319/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in
__call__(self, *args)

    811         answer = self.gateway_client.send_command(command)

    812         return_value = get_return_value(

--> 813             answer, self.gateway_client, self.target_id, self.name)

    814

    815         for temp_arg in temp_args:


/Users/i854319/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw)

     43     def deco(*a, **kw):

     44         try:

---> 45             return f(*a, **kw)

     46         except py4j.protocol.Py4JJavaError as e:

     47             s = e.java_exception.toString()


/Users/i854319/spark/python/lib/py4j-0.9-src.zip/py4j/protocol.py in
get_return_value(answer, gateway_client, target_id, name)

    306                 raise Py4JJavaError(

    307                     "An error occurred while calling {0}{1}{2}.\n".

--> 308                     format(target_id, ".", name), value)

    309             else:

    310                 raise Py4JError(


Py4JJavaError: An error occurred while calling o40.showString.

: java.lang.OutOfMemoryError: Java heap space

at java.util.Arrays.copyOf(Arrays.java:3332)

at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)

at
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)

at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)

at java.lang.StringBuilder.append(StringBuilder.java:136)

at scala.StringContext.standardInterpolator(StringContext.scala:123)

at scala.StringContext.s(StringContext.scala:90)

at
org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70)

at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52)

at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)

at org.apache.spark.sql.DataFrame.org
$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)

at org.apache.spark.sql.DataFrame.org
$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)

at
org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)

at
org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374)

at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)

at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)

at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)

at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:170)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)

at py4j.Gateway.invoke(Gateway.java:259)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:209)

at java.lang.Thread.run(Thread.java:745)
ᐧ

On Wed, Sep 7, 2016 at 10:52 AM, neil90 [via Apache Spark User List] <
ml-node+s1001560n27673h16@n3.nabble.com> wrote:

> If your in local mode just allocate all your memory you want to use to
> your Driver(that acts as the executor in local mode) don't even bother
> changing the executor memory. So your new settings should look like this...
>
> spark.driver.memory              16g
> spark.driver.maxResultSize       2g
> spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value
>
> You might need to change your spark.driver.maxResultSize settings if you
> plan on doing a collect on the entire rdd/dataframe.
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-
> Java-Heap-Error-tp27669p27673.html
> To unsubscribe from Spark Java Heap Error, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=27669&code=dHIubWFuaXNoQGdtYWlsLmNvbXwyNzY2OXwtNjcyNzMzNjcz>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27686.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.