You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by RUSHIKESH RAUT <ru...@gmail.com> on 2017/03/25 09:06:01 UTC

Zeppelin out of memory issue - (GC overhead limit exceeded)

Hi everyone,

I am trying to load some data from hive table into my notebook and then
convert this dataframe into r dataframe using spark.r interpreter. This
works perfectly for small amount of data.
But if the data is increased then it gives me error

java.lang.OutOfMemoryError: GC overhead limit exceeded

I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in
the zeppelin-env.cmd file but i am still facing this issue. I have used the
following configuration

set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"

I am sure that this much size should be sufficient for my data but still i
am getting this same error. Any guidance will be much appreciated.

Thanks,
Rushikesh Raut

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.

I verify it in master branch, it works for me. Set it in interpreter setting page as following.


[cid:8CB49F76-39F5-4A53-816B-9E47F7993050]


Best Regard,
Jeff Zhang


From: RUSHIKESH RAUT <ru...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Sunday, March 26, 2017 at 8:02 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Thanks Jianfeng,

But i am still not able to solve the issue. I have set it to 4g but still no luck.Can you please explain it to me how can I set SPARK_DRIVER_MEMORY  property.
Also as I have read that GC overhead limit exceeded error occurs when the heap memory is insufficient. So How can I increase the heap memory. Please correct me if I am wrong as I am still trying to learn these things.
Reagrds,
Rushikesh Raut

On Sun, Mar 26, 2017 at 4:25 PM, Jianfeng (Jeff) Zhang <jz...@hortonworks.com>> wrote:

This is a bug of zeppelin. spark.driver.memory won't take effect. As for now it isn't passed to spark through -conf parameter. See https://issues.apache.org/jira/browse/ZEPPELIN-1263
The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting page.



Best Regard,
Jeff Zhang


From: RUSHIKESH RAUT <ru...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Sunday, March 26, 2017 at 5:03 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

ZEPPELIN_INTP_JAVA_OPTS

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by RUSHIKESH RAUT <ru...@gmail.com>.

Thanks Jianfeng,

But i am still not able to solve the issue. I have set it to 4g but still
no luck.Can you please explain it to me how can I set SPARK_DRIVER_MEMORY
 property.
Also as I have read that GC overhead limit exceeded error occurs when the
heap memory is insufficient. So How can I increase the heap memory. Please
correct me if I am wrong as I am still trying to learn these things.
Reagrds,
Rushikesh Raut

On Sun, Mar 26, 2017 at 4:25 PM, Jianfeng (Jeff) Zhang <
jzhang@hortonworks.com> wrote:

>
> This is a bug of zeppelin. spark.driver.memory won’t take effect. As for
> now it isn’t passed to spark through —conf parameter. See
> https://issues.apache.org/jira/browse/ZEPPELIN-1263
> The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting
> page.
>
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: RUSHIKESH RAUT <ru...@gmail.com>
> Reply-To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Date: Sunday, March 26, 2017 at 5:03 PM
> To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)
>
> ZEPPELIN_INTP_JAVA_OPTS
>

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.

This is a bug of zeppelin. spark.driver.memory won't take effect. As for now it isn't passed to spark through -conf parameter. See https://issues.apache.org/jira/browse/ZEPPELIN-1263
The workaround is to specify SPARK_DRIVER_MEMORY in interpreter setting page.



Best Regard,
Jeff Zhang


From: RUSHIKESH RAUT <ru...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Sunday, March 26, 2017 at 5:03 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Subject: Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

ZEPPELIN_INTP_JAVA_OPTS

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by RUSHIKESH RAUT <ru...@gmail.com>.

I tried setting it as

spark.driver.memory 4g

But still it is giving same error so tried it with -X. Now I have removed it

But as per my understanding this is the spark driver memory, I want to
increase the heap size used by the interpreter.
Because when I run
*ps aux | grep zeppelin*
on my machine i get

/usr/hdp/2.3.3.1-25/tez/lib/*:/usr/hdp/2.3.3.1-25/tez/conf/ *-Xmx1g*
-Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///softwares/maxiq/zeppelin-0.7/zeppelin-0.7.0-bin-all/conf/log4j.properties
-Dzeppelin.log.file=/softwares/maxiq/zeppelin-0.7/zeppelin-0.7.0-bin-all/logs/zeppelin-interpreter-spark-maxiq-hn0-maxiqs.log
-*XX:MaxPermSize=256m* org.apache.spark.deploy.SparkSubmit

This is just a prat of the output. But you can see that it is taking -Xmx1g
and XX:MaxPermSize=256m. I want to increase this. I have tried to debug it
in interpreter.sh and interpreter.cmd and found that it is taking this
parameters from zeppelin-env.cmd but even if i set

set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"
set ZEPPELIN_INTP_JAVA_OPTS="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048"
set JAVA_INTP_OPTS="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048"

it is still showing me  -Xmx1g and XX:MaxPermSize=256m and running out of
memory. What should I do?

On Sun, Mar 26, 2017 at 2:17 PM, Eric Charles <er...@apache.org> wrote:

> You don't have to set spark.driver.memory with -X... but simply with
> memory size.
>
> Look at http://spark.apache.org/docs/latest/configuration.html
>
> spark.driver.memory     1g      Amount of memory to use for the driver
> process, i.e. where SparkContext is initialized. (e.g. 1g, 2g).
> Note: In client mode, this config must not be set through the SparkConf
> directly in your application, because the driver JVM has already started at
> that point. Instead, please set this through the --driver-memory command
> line option or in your default properties file.
>
>
>
>
> On 26/03/17 09:57, RUSHIKESH RAUT wrote:
>
>> What value should I set there?
>> Currently I have set it as
>>
>> spark.driver.memory  -Xms4096m -Xmx4096m -XX:MaxPermSize=2048m
>>
>> But still same error
>>
>> On Mar 26, 2017 1:19 PM, "Eric Charles" <eric@apache.org
>> <ma...@apache.org>> wrote:
>>
>>     You also have to check the memory you give to the spark driver
>>     (spark.driver.memory property)
>>
>>     On 26/03/17 07:40, RUSHIKESH RAUT wrote:
>>
>>         Yes I know it inevitable if the data is large. I want to know
>>         how do I
>>         increase the interpreter memory to handle large data?
>>
>>         Thanks,
>>         Rushikesh Raut
>>
>>         On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang"
>>         <jzhang@hortonworks.com <ma...@hortonworks.com>
>>         <mailto:jzhang@hortonworks.com <ma...@hortonworks.com>>>
>>         wrote:
>>
>>
>>             How large is your data ? This problem is inevitable if your
>>         data is
>>             too large, you can try to use spark data frame if that works
>>         for you.
>>
>>
>>
>>
>>
>>             Best Regard,
>>             Jeff Zhang
>>
>>
>>             From: RUSHIKESH RAUT <rushikeshraut777@gmail.com
>>         <ma...@gmail.com>
>>             <mailto:rushikeshraut777@gmail.com
>>         <ma...@gmail.com>>>
>>             Reply-To: "users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>
>>             <mailto:users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>>" <users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>
>>             <mailto:users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>>>
>>             Date: Saturday, March 25, 2017 at 5:06 PM
>>             To: "users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>
>>         <mailto:users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>>"
>>             <users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>
>>         <mailto:users@zeppelin.apache.org
>>         <ma...@zeppelin.apache.org>>>
>>             Subject: Zeppelin out of memory issue - (GC overhead limit
>>         exceeded)
>>
>>             Hi everyone,
>>
>>             I am trying to load some data from hive table into my
>>         notebook and
>>             then convert this dataframe into r dataframe using spark.r
>>             interpreter. This works perfectly for small amount of data.
>>             But if the data is increased then it gives me error
>>
>>             java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>             I have tried increasing the ZEPPELIN_MEM and
>>         ZEPPELIN_INTP_MEM in
>>             the zeppelin-env.cmd file but i am still facing this issue.
>>         I have
>>             used the following configuration
>>
>>             set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
>>             set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m
>>         -XX:MaxPermSize=2048m"
>>
>>             I am sure that this much size should be sufficient for my
>>         data but
>>             still i am getting this same error. Any guidance will be much
>>             appreciated.
>>
>>             Thanks,
>>             Rushikesh Raut
>>
>>

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by Eric Charles <er...@apache.org>.

You don't have to set spark.driver.memory with -X... but simply with 
memory size.

Look at http://spark.apache.org/docs/latest/configuration.html

spark.driver.memory 	1g 	Amount of memory to use for the driver process, 
i.e. where SparkContext is initialized. (e.g. 1g, 2g).
Note: In client mode, this config must not be set through the SparkConf 
directly in your application, because the driver JVM has already started 
at that point. Instead, please set this through the --driver-memory 
command line option or in your default properties file.




On 26/03/17 09:57, RUSHIKESH RAUT wrote:
> What value should I set there?
> Currently I have set it as
>
> spark.driver.memory  -Xms4096m -Xmx4096m -XX:MaxPermSize=2048m
>
> But still same error
>
> On Mar 26, 2017 1:19 PM, "Eric Charles" <eric@apache.org
> <ma...@apache.org>> wrote:
>
>     You also have to check the memory you give to the spark driver
>     (spark.driver.memory property)
>
>     On 26/03/17 07:40, RUSHIKESH RAUT wrote:
>
>         Yes I know it inevitable if the data is large. I want to know
>         how do I
>         increase the interpreter memory to handle large data?
>
>         Thanks,
>         Rushikesh Raut
>
>         On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang"
>         <jzhang@hortonworks.com <ma...@hortonworks.com>
>         <mailto:jzhang@hortonworks.com <ma...@hortonworks.com>>>
>         wrote:
>
>
>             How large is your data ? This problem is inevitable if your
>         data is
>             too large, you can try to use spark data frame if that works
>         for you.
>
>
>
>
>
>             Best Regard,
>             Jeff Zhang
>
>
>             From: RUSHIKESH RAUT <rushikeshraut777@gmail.com
>         <ma...@gmail.com>
>             <mailto:rushikeshraut777@gmail.com
>         <ma...@gmail.com>>>
>             Reply-To: "users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>
>             <mailto:users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>>" <users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>
>             <mailto:users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>>>
>             Date: Saturday, March 25, 2017 at 5:06 PM
>             To: "users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>
>         <mailto:users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>>"
>             <users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>
>         <mailto:users@zeppelin.apache.org
>         <ma...@zeppelin.apache.org>>>
>             Subject: Zeppelin out of memory issue - (GC overhead limit
>         exceeded)
>
>             Hi everyone,
>
>             I am trying to load some data from hive table into my
>         notebook and
>             then convert this dataframe into r dataframe using spark.r
>             interpreter. This works perfectly for small amount of data.
>             But if the data is increased then it gives me error
>
>             java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>             I have tried increasing the ZEPPELIN_MEM and
>         ZEPPELIN_INTP_MEM in
>             the zeppelin-env.cmd file but i am still facing this issue.
>         I have
>             used the following configuration
>
>             set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
>             set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m
>         -XX:MaxPermSize=2048m"
>
>             I am sure that this much size should be sufficient for my
>         data but
>             still i am getting this same error. Any guidance will be much
>             appreciated.
>
>             Thanks,
>             Rushikesh Raut
>

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by RUSHIKESH RAUT <ru...@gmail.com>.

What value should I set there?
Currently I have set it as

spark.driver.memory  -Xms4096m -Xmx4096m -XX:MaxPermSize=2048m

But still same error

On Mar 26, 2017 1:19 PM, "Eric Charles" <er...@apache.org> wrote:

> You also have to check the memory you give to the spark driver
> (spark.driver.memory property)
>
> On 26/03/17 07:40, RUSHIKESH RAUT wrote:
>
>> Yes I know it inevitable if the data is large. I want to know how do I
>> increase the interpreter memory to handle large data?
>>
>> Thanks,
>> Rushikesh Raut
>>
>> On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang" <jzhang@hortonworks.com
>> <ma...@hortonworks.com>> wrote:
>>
>>
>>     How large is your data ? This problem is inevitable if your data is
>>     too large, you can try to use spark data frame if that works for you.
>>
>>
>>
>>
>>
>>     Best Regard,
>>     Jeff Zhang
>>
>>
>>     From: RUSHIKESH RAUT <rushikeshraut777@gmail.com
>>     <ma...@gmail.com>>
>>     Reply-To: "users@zeppelin.apache.org
>>     <ma...@zeppelin.apache.org>" <users@zeppelin.apache.org
>>     <ma...@zeppelin.apache.org>>
>>     Date: Saturday, March 25, 2017 at 5:06 PM
>>     To: "users@zeppelin.apache.org <ma...@zeppelin.apache.org>"
>>     <users@zeppelin.apache.org <ma...@zeppelin.apache.org>>
>>     Subject: Zeppelin out of memory issue - (GC overhead limit exceeded)
>>
>>     Hi everyone,
>>
>>     I am trying to load some data from hive table into my notebook and
>>     then convert this dataframe into r dataframe using spark.r
>>     interpreter. This works perfectly for small amount of data.
>>     But if the data is increased then it gives me error
>>
>>     java.lang.OutOfMemoryError: GC overhead limit exceeded
>>
>>     I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in
>>     the zeppelin-env.cmd file but i am still facing this issue. I have
>>     used the following configuration
>>
>>     set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
>>     set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"
>>
>>     I am sure that this much size should be sufficient for my data but
>>     still i am getting this same error. Any guidance will be much
>>     appreciated.
>>
>>     Thanks,
>>     Rushikesh Raut
>>
>>

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by Eric Charles <er...@apache.org>.

You also have to check the memory you give to the spark driver 
(spark.driver.memory property)

On 26/03/17 07:40, RUSHIKESH RAUT wrote:
> Yes I know it inevitable if the data is large. I want to know how do I
> increase the interpreter memory to handle large data?
>
> Thanks,
> Rushikesh Raut
>
> On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang" <jzhang@hortonworks.com
> <ma...@hortonworks.com>> wrote:
>
>
>     How large is your data ? This problem is inevitable if your data is
>     too large, you can try to use spark data frame if that works for you.
>
>
>
>
>
>     Best Regard,
>     Jeff Zhang
>
>
>     From: RUSHIKESH RAUT <rushikeshraut777@gmail.com
>     <ma...@gmail.com>>
>     Reply-To: "users@zeppelin.apache.org
>     <ma...@zeppelin.apache.org>" <users@zeppelin.apache.org
>     <ma...@zeppelin.apache.org>>
>     Date: Saturday, March 25, 2017 at 5:06 PM
>     To: "users@zeppelin.apache.org <ma...@zeppelin.apache.org>"
>     <users@zeppelin.apache.org <ma...@zeppelin.apache.org>>
>     Subject: Zeppelin out of memory issue - (GC overhead limit exceeded)
>
>     Hi everyone,
>
>     I am trying to load some data from hive table into my notebook and
>     then convert this dataframe into r dataframe using spark.r
>     interpreter. This works perfectly for small amount of data.
>     But if the data is increased then it gives me error
>
>     java.lang.OutOfMemoryError: GC overhead limit exceeded
>
>     I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in
>     the zeppelin-env.cmd file but i am still facing this issue. I have
>     used the following configuration
>
>     set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
>     set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"
>
>     I am sure that this much size should be sufficient for my data but
>     still i am getting this same error. Any guidance will be much
>     appreciated.
>
>     Thanks,
>     Rushikesh Raut
>

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by RUSHIKESH RAUT <ru...@gmail.com>.

Yes I know it inevitable if the data is large. I want to know how do I
increase the interpreter memory to handle large data?

Thanks,
Rushikesh Raut

On Mar 26, 2017 8:56 AM, "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>
wrote:

>
> How large is your data ? This problem is inevitable if your data is too
> large, you can try to use spark data frame if that works for you.
>
>
>
>
>
> Best Regard,
> Jeff Zhang
>
>
> From: RUSHIKESH RAUT <ru...@gmail.com>
> Reply-To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Date: Saturday, March 25, 2017 at 5:06 PM
> To: "users@zeppelin.apache.org" <us...@zeppelin.apache.org>
> Subject: Zeppelin out of memory issue - (GC overhead limit exceeded)
>
> Hi everyone,
>
> I am trying to load some data from hive table into my notebook and then
> convert this dataframe into r dataframe using spark.r interpreter. This
> works perfectly for small amount of data.
> But if the data is increased then it gives me error
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>
> I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in
> the zeppelin-env.cmd file but i am still facing this issue. I have used the
> following configuration
>
> set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
> set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"
>
> I am sure that this much size should be sufficient for my data but still i
> am getting this same error. Any guidance will be much appreciated.
>
> Thanks,
> Rushikesh Raut
>

Re: Zeppelin out of memory issue - (GC overhead limit exceeded)

Posted by "Jianfeng (Jeff) Zhang" <jz...@hortonworks.com>.

How large is your data ? This problem is inevitable if your data is too large, you can try to use spark data frame if that works for you.





Best Regard,
Jeff Zhang


From: RUSHIKESH RAUT <ru...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Date: Saturday, March 25, 2017 at 5:06 PM
To: "users@zeppelin.apache.org<ma...@zeppelin.apache.org>" <us...@zeppelin.apache.org>>
Subject: Zeppelin out of memory issue - (GC overhead limit exceeded)

Hi everyone,

I am trying to load some data from hive table into my notebook and then convert this dataframe into r dataframe using spark.r interpreter. This works perfectly for small amount of data.
But if the data is increased then it gives me error

java.lang.OutOfMemoryError: GC overhead limit exceeded

I have tried increasing the ZEPPELIN_MEM and ZEPPELIN_INTP_MEM in the zeppelin-env.cmd file but i am still facing this issue. I have used the following configuration

set ZEPPELIN_MEM="-Xms4096m -Xmx4096m -XX:MaxPermSize=2048m"
set ZEPPELIN_INTP_MEM="-Xmx4096m -Xms4096m -XX:MaxPermSize=2048m"

I am sure that this much size should be sufficient for my data but still i am getting this same error. Any guidance will be much appreciated.

Thanks,
Rushikesh Raut