You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@zeppelin.apache.org by Chintan Patel <ch...@qdata.io> on 2018/08/14 08:58:33 UTC

"GC overhead limit exceeded" while running multiple jobs

Hello,

I'm running Zeppelin in yarn-client mode. I'm using SQL interpreter and
pyspark interpreter to run some query and python jobs in shared mode per
note. Sometimes when I run multiple jobs at same time. It's using lot's of
CPU. I try to check the problem and I found that It's because of It creates
spark driver for each notebook.


My Question are
1. How I can tune Zeppelin to Handle large amount of concurrent jobs to fix
"GC overhead limit exceeded" ?
2. How can I scale the zeppelin with number of users ?
3. If memory or CPU is not available, Is there any way to backlog the jobs ?

Thanks & Regards
Chintan

Re: "GC overhead limit exceeded" while running multiple jobs

Posted by Jongyoul Lee <jo...@gmail.com>.

Yes, that might solve most of the issues in general.

On Sat, Aug 18, 2018 at 5:43 PM, Jeff Zhang <zj...@gmail.com> wrote:

>
> you can increase driver memory by setting spark.driver.memory
>
>
> Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:35写道：
>
>> Hi,
>>
>> 1. AKAIF, it’s a problem of spark-shell. Z’s spark interpreter uses
>> spark-shell internally. Thus it cannot be solved easily.
>>
>> 2. You could try ‘per user’ setting in your interpreter.
>>
>> 3. Currently, there’s no way to figure it out.
>>
>> On Tue, 14 Aug 2018 at 5:59 PM Chintan Patel <ch...@qdata.io>
>> wrote:
>>
>>> Hello,
>>>
>>> I'm running Zeppelin in yarn-client mode. I'm using SQL interpreter and
>>> pyspark interpreter to run some query and python jobs in shared mode per
>>> note. Sometimes when I run multiple jobs at same time. It's using lot's of
>>> CPU. I try to check the problem and I found that It's because of It creates
>>> spark driver for each notebook.
>>>
>>>
>>> My Question are
>>> 1. How I can tune Zeppelin to Handle large amount of concurrent jobs to
>>> fix "GC overhead limit exceeded" ?
>>> 2. How can I scale the zeppelin with number of users ?
>>> 3. If memory or CPU is not available, Is there any way to backlog the
>>> jobs ?
>>>
>>> Thanks & Regards
>>> Chintan
>>>
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>
>


-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Re: "GC overhead limit exceeded" while running multiple jobs

Posted by Jeff Zhang <zj...@gmail.com>.

you can increase driver memory by setting spark.driver.memory


Jongyoul Lee <jo...@gmail.com>于2018年8月18日周六 下午2:35写道：

> Hi,
>
> 1. AKAIF, it’s a problem of spark-shell. Z’s spark interpreter uses
> spark-shell internally. Thus it cannot be solved easily.
>
> 2. You could try ‘per user’ setting in your interpreter.
>
> 3. Currently, there’s no way to figure it out.
>
> On Tue, 14 Aug 2018 at 5:59 PM Chintan Patel <ch...@qdata.io>
> wrote:
>
>> Hello,
>>
>> I'm running Zeppelin in yarn-client mode. I'm using SQL interpreter and
>> pyspark interpreter to run some query and python jobs in shared mode per
>> note. Sometimes when I run multiple jobs at same time. It's using lot's of
>> CPU. I try to check the problem and I found that It's because of It creates
>> spark driver for each notebook.
>>
>>
>> My Question are
>> 1. How I can tune Zeppelin to Handle large amount of concurrent jobs to
>> fix "GC overhead limit exceeded" ?
>> 2. How can I scale the zeppelin with number of users ?
>> 3. If memory or CPU is not available, Is there any way to backlog the
>> jobs ?
>>
>> Thanks & Regards
>> Chintan
>>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>

Re: "GC overhead limit exceeded" while running multiple jobs

Posted by Jongyoul Lee <jo...@gmail.com>.

Hi,

1. AKAIF, it’s a problem of spark-shell. Z’s spark interpreter uses
spark-shell internally. Thus it cannot be solved easily.

2. You could try ‘per user’ setting in your interpreter.

3. Currently, there’s no way to figure it out.

On Tue, 14 Aug 2018 at 5:59 PM Chintan Patel <ch...@qdata.io> wrote:

> Hello,
>
> I'm running Zeppelin in yarn-client mode. I'm using SQL interpreter and
> pyspark interpreter to run some query and python jobs in shared mode per
> note. Sometimes when I run multiple jobs at same time. It's using lot's of
> CPU. I try to check the problem and I found that It's because of It creates
> spark driver for each notebook.
>
>
> My Question are
> 1. How I can tune Zeppelin to Handle large amount of concurrent jobs to
> fix "GC overhead limit exceeded" ?
> 2. How can I scale the zeppelin with number of users ?
> 3. If memory or CPU is not available, Is there any way to backlog the jobs
> ?
>
> Thanks & Regards
> Chintan
>
-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net