You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sea <26...@qq.com> on 2015/08/02 11:16:47 UTC

Re: About memory leak in spark 1.4.1

Hi, Barak
    It is ok with spark 1.3.0, the problem is with spark 1.4.1.
    I don't think spark.storage.memoryFraction will make any sense, because it is still in heap memory. 




------------------ 原始邮件 ------------------
发件人: "Barak Gitsis";<ba...@similarweb.com>;
发送时间: 2015年8月2日(星期天) 下午4:11
收件人: "Sea"<26...@qq.com>; "user"<us...@spark.apache.org>; 
抄送: "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<da...@databricks.com>; 
主题: Re: About memory leak in spark 1.4.1



Hi,reducing spark.storage.memoryFraction did the trick for me. Heap doesn't get filled because it is reserved..
My reasoning is: 
I give executor all the memory i can give it, so that makes it a boundary.
From here i try to make the best use of memory I can. storage.memoryFraction is in a sense user data space.  The rest can be used by the system. 
If you don't have so much data that you MUST store in memory for performance, better give spark more space.. 
ended up setting it to 0.3


All that said, it is on spark 1.3 on cluster


hope that helps


On Sat, Aug 1, 2015 at 5:43 PM Sea <26...@qq.com> wrote:

Hi, all
I upgrage spark to 1.4.1, many applications failed... I find the heap memory is not full , but the process of CoarseGrainedExecutorBackend will take more memory than I expect, and it will increase as time goes on, finally more than max limited of the server, the worker will die.....


Any can help?


Mode:standalone


spark.executor.memory 50g


25583 xiaoju    20   0 75.5g  55g  28m S 1729.3 88.1   2172:52 java


55g more than 50g I apply



-- 

-Barak

Re: About memory leak in spark 1.4.1

Posted by Barak Gitsis <ba...@similarweb.com>.
Sea, it exists, trust me. We have spark in production under Yarn.
if you want more control use Yarn if you can. At least it kills the
executor if it hogs memory..

I am explicitly setting
spark.yarn.executor.memoryOverhead to the same size as heap for one of our
processes

For example:
spark.executor.memory              4g
spark.yarn.executor.memoryOverhead 4000

Try the following config:
spark.executor.memory        25g
spark.storage.memoryFraction 0.2  (this is more for safety, hopefully you
will get a lot of GC and plain java OOM instead of just memory overuse by
some off heap magic)

And check memory usage.It should give you a feel of offheap memory
consumption of your application.
If it still dies because machine memory gets completely filled up perhaps
there is a memory leak in spark 1.4




On Mon, Aug 3, 2015 at 4:58 AM Sea <26...@qq.com> wrote:

> "spark uses a lot more than heap memory, it is the expected behavior."
>  It didn't exist in spark 1.3.x
> What does "a lot more than" means?  It means that I lose control of it!
> I try to  apply 31g, but it still grows to 55g and continues to grow!!!
> That is the point!
> I have tried set memoryFraction to 0.2,but it didn't help.
> I don't know whether it will still exist in the next release 1.5, I wish
> not.
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Barak Gitsis";<ba...@similarweb.com>;
> *发送时间:* 2015年8月2日(星期天) 晚上9:55
> *收件人:* "Sea"<26...@qq.com>; "Ted Yu"<yu...@gmail.com>;
> *抄送:* "user@spark.apache.org"<us...@spark.apache.org>; "rxin"<
> rxin@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<
> davies@databricks.com>;
> *主题:* Re: About memory leak in spark 1.4.1
>
> spark uses a lot more than heap memory, it is the expected behavior.
> in 1.4 off-heap memory usage is supposed to grow in comparison to 1.3
>
> Better use as little memory as you can for heap, and since you are not
> utilizing it already, it is safe for you to reduce it.
> memoryFraction helps you optimize heap usage for your data/application
> profile while keeping it tight.
>
>
>
>
>
>
> On Sun, Aug 2, 2015 at 12:54 PM Sea <26...@qq.com> wrote:
>
>> spark.storage.memoryFraction is in heap memory, but my situation is that
>> the memory is more than heap memory !
>>
>> Anyone else use spark 1.4.1 in production?
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Ted Yu";<yu...@gmail.com>;
>> *发送时间:* 2015年8月2日(星期天) 下午5:45
>> *收件人:* "Sea"<26...@qq.com>;
>> *抄送:* "Barak Gitsis"<ba...@similarweb.com>; "user@spark.apache.org"<
>> user@spark.apache.org>; "rxin"<rx...@databricks.com>; "joshrosen"<
>> joshrosen@databricks.com>; "davies"<da...@databricks.com>;
>> *主题:* Re: About memory leak in spark 1.4.1
>>
>> http://spark.apache.org/docs/latest/tuning.html does mention spark.storage.memoryFraction
>> in two places.
>> One is under Cache Size Tuning section.
>>
>> FYI
>>
>> On Sun, Aug 2, 2015 at 2:16 AM, Sea <26...@qq.com> wrote:
>>
>>> Hi, Barak
>>>     It is ok with spark 1.3.0, the problem is with spark 1.4.1.
>>>     I don't think spark.storage.memoryFraction will make any sense,
>>> because it is still in heap memory.
>>>
>>>
>>> ------------------ 原始邮件 ------------------
>>> *发件人:* "Barak Gitsis";<ba...@similarweb.com>;
>>> *发送时间:* 2015年8月2日(星期天) 下午4:11
>>> *收件人:* "Sea"<26...@qq.com>; "user"<us...@spark.apache.org>;
>>> *抄送:* "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>;
>>> "davies"<da...@databricks.com>;
>>> *主题:* Re: About memory leak in spark 1.4.1
>>>
>>> Hi,
>>> reducing spark.storage.memoryFraction did the trick for me. Heap
>>> doesn't get filled because it is reserved..
>>> My reasoning is:
>>> I give executor all the memory i can give it, so that makes it a
>>> boundary.
>>> From here i try to make the best use of memory I can.
>>> storage.memoryFraction is in a sense user data space.  The rest can be used
>>> by the system.
>>> If you don't have so much data that you MUST store in memory for
>>> performance, better give spark more space..
>>> ended up setting it to 0.3
>>>
>>> All that said, it is on spark 1.3 on cluster
>>>
>>> hope that helps
>>>
>>> On Sat, Aug 1, 2015 at 5:43 PM Sea <26...@qq.com> wrote:
>>>
>>>> Hi, all
>>>> I upgrage spark to 1.4.1, many applications failed... I find the heap
>>>> memory is not full , but the process of CoarseGrainedExecutorBackend will
>>>> take more memory than I expect, and it will increase as time goes on,
>>>> finally more than max limited of the server, the worker will die.....
>>>>
>>>> Any can help?
>>>>
>>>> Mode:standalone
>>>>
>>>> spark.executor.memory 50g
>>>>
>>>> 25583 xiaoju    20   0 75.5g  55g  28m S 1729.3 88.1   2172:52 java
>>>>
>>>> 55g more than 50g I apply
>>>>
>>>> --
>>> *-Barak*
>>>
>>
>> --
> *-Barak*
>
-- 
*-Barak*

Re: About memory leak in spark 1.4.1

Posted by Igor Berman <ig...@gmail.com>.
in general, what is your configuration? use --conf "spark.logConf=true"

we have 1.4.1 in production standalone cluster and haven't experienced what
you are describing
can you verify in web-ui that indeed spark got your 50g per executor limit?
I mean in configuration page..

might be you are using offheap storage(Tachyon)?


On 3 August 2015 at 04:58, Sea <26...@qq.com> wrote:

> "spark uses a lot more than heap memory, it is the expected behavior."
>  It didn't exist in spark 1.3.x
> What does "a lot more than" means?  It means that I lose control of it!
> I try to  apply 31g, but it still grows to 55g and continues to grow!!!
> That is the point!
> I have tried set memoryFraction to 0.2,but it didn't help.
> I don't know whether it will still exist in the next release 1.5, I wish
> not.
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Barak Gitsis";<ba...@similarweb.com>;
> *发送时间:* 2015年8月2日(星期天) 晚上9:55
> *收件人:* "Sea"<26...@qq.com>; "Ted Yu"<yu...@gmail.com>;
> *抄送:* "user@spark.apache.org"<us...@spark.apache.org>; "rxin"<
> rxin@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<
> davies@databricks.com>;
> *主题:* Re: About memory leak in spark 1.4.1
>
> spark uses a lot more than heap memory, it is the expected behavior.
> in 1.4 off-heap memory usage is supposed to grow in comparison to 1.3
>
> Better use as little memory as you can for heap, and since you are not
> utilizing it already, it is safe for you to reduce it.
> memoryFraction helps you optimize heap usage for your data/application
> profile while keeping it tight.
>
>
>
>
>
>
> On Sun, Aug 2, 2015 at 12:54 PM Sea <26...@qq.com> wrote:
>
>> spark.storage.memoryFraction is in heap memory, but my situation is that
>> the memory is more than heap memory !
>>
>> Anyone else use spark 1.4.1 in production?
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Ted Yu";<yu...@gmail.com>;
>> *发送时间:* 2015年8月2日(星期天) 下午5:45
>> *收件人:* "Sea"<26...@qq.com>;
>> *抄送:* "Barak Gitsis"<ba...@similarweb.com>; "user@spark.apache.org"<
>> user@spark.apache.org>; "rxin"<rx...@databricks.com>; "joshrosen"<
>> joshrosen@databricks.com>; "davies"<da...@databricks.com>;
>> *主题:* Re: About memory leak in spark 1.4.1
>>
>> http://spark.apache.org/docs/latest/tuning.html does mention spark.storage.memoryFraction
>> in two places.
>> One is under Cache Size Tuning section.
>>
>> FYI
>>
>> On Sun, Aug 2, 2015 at 2:16 AM, Sea <26...@qq.com> wrote:
>>
>>> Hi, Barak
>>>     It is ok with spark 1.3.0, the problem is with spark 1.4.1.
>>>     I don't think spark.storage.memoryFraction will make any sense,
>>> because it is still in heap memory.
>>>
>>>
>>> ------------------ 原始邮件 ------------------
>>> *发件人:* "Barak Gitsis";<ba...@similarweb.com>;
>>> *发送时间:* 2015年8月2日(星期天) 下午4:11
>>> *收件人:* "Sea"<26...@qq.com>; "user"<us...@spark.apache.org>;
>>> *抄送:* "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>;
>>> "davies"<da...@databricks.com>;
>>> *主题:* Re: About memory leak in spark 1.4.1
>>>
>>> Hi,
>>> reducing spark.storage.memoryFraction did the trick for me. Heap
>>> doesn't get filled because it is reserved..
>>> My reasoning is:
>>> I give executor all the memory i can give it, so that makes it a
>>> boundary.
>>> From here i try to make the best use of memory I can.
>>> storage.memoryFraction is in a sense user data space.  The rest can be used
>>> by the system.
>>> If you don't have so much data that you MUST store in memory for
>>> performance, better give spark more space..
>>> ended up setting it to 0.3
>>>
>>> All that said, it is on spark 1.3 on cluster
>>>
>>> hope that helps
>>>
>>> On Sat, Aug 1, 2015 at 5:43 PM Sea <26...@qq.com> wrote:
>>>
>>>> Hi, all
>>>> I upgrage spark to 1.4.1, many applications failed... I find the heap
>>>> memory is not full , but the process of CoarseGrainedExecutorBackend will
>>>> take more memory than I expect, and it will increase as time goes on,
>>>> finally more than max limited of the server, the worker will die.....
>>>>
>>>> Any can help?
>>>>
>>>> Mode:standalone
>>>>
>>>> spark.executor.memory 50g
>>>>
>>>> 25583 xiaoju    20   0 75.5g  55g  28m S 1729.3 88.1   2172:52 java
>>>>
>>>> 55g more than 50g I apply
>>>>
>>>> --
>>> *-Barak*
>>>
>>
>> --
> *-Barak*
>

Re: About memory leak in spark 1.4.1

Posted by Sea <26...@qq.com>.
"spark uses a lot more than heap memory, it is the expected behavior."  It didn't exist in spark 1.3.x
What does "a lot more than" means?  It means that I lose control of it!
I try to  apply 31g, but it still grows to 55g and continues to grow!!! That is the point!
I have tried set memoryFraction to 0.2,but it didn't help.
I don't know whether it will still exist in the next release 1.5, I wish not.






------------------ 原始邮件 ------------------
发件人: "Barak Gitsis";<ba...@similarweb.com>;
发送时间: 2015年8月2日(星期天) 晚上9:55
收件人: "Sea"<26...@qq.com>; "Ted Yu"<yu...@gmail.com>; 
抄送: "user@spark.apache.org"<us...@spark.apache.org>; "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<da...@databricks.com>; 
主题: Re: About memory leak in spark 1.4.1



spark uses a lot more than heap memory, it is the expected behavior.in 1.4 off-heap memory usage is supposed to grow in comparison to 1.3


Better use as little memory as you can for heap, and since you are not utilizing it already, it is safe for you to reduce it.
memoryFraction helps you optimize heap usage for your data/application profile while keeping it tight.



 






On Sun, Aug 2, 2015 at 12:54 PM Sea <26...@qq.com> wrote:

spark.storage.memoryFraction is in heap memory, but my situation is that the memory is more than heap memory !  


Anyone else use spark 1.4.1 in production? 




------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年8月2日(星期天) 下午5:45
收件人: "Sea"<26...@qq.com>; 
抄送: "Barak Gitsis"<ba...@similarweb.com>; "user@spark.apache.org"<us...@spark.apache.org>; "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<da...@databricks.com>; 


主题: Re: About memory leak in spark 1.4.1




http://spark.apache.org/docs/latest/tuning.html does mention spark.storage.memoryFraction in two places.
One is under Cache Size Tuning section.


FYI


On Sun, Aug 2, 2015 at 2:16 AM, Sea <26...@qq.com> wrote:
Hi, Barak
    It is ok with spark 1.3.0, the problem is with spark 1.4.1.
    I don't think spark.storage.memoryFraction will make any sense, because it is still in heap memory. 




------------------ 原始邮件 ------------------
发件人: "Barak Gitsis";<ba...@similarweb.com>;
发送时间: 2015年8月2日(星期天) 下午4:11
收件人: "Sea"<26...@qq.com>; "user"<us...@spark.apache.org>; 
抄送: "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<da...@databricks.com>; 
主题: Re: About memory leak in spark 1.4.1



Hi,reducing spark.storage.memoryFraction did the trick for me. Heap doesn't get filled because it is reserved..
My reasoning is: 
I give executor all the memory i can give it, so that makes it a boundary.
From here i try to make the best use of memory I can. storage.memoryFraction is in a sense user data space.  The rest can be used by the system. 
If you don't have so much data that you MUST store in memory for performance, better give spark more space.. 
ended up setting it to 0.3


All that said, it is on spark 1.3 on cluster


hope that helps


On Sat, Aug 1, 2015 at 5:43 PM Sea <26...@qq.com> wrote:

Hi, all
I upgrage spark to 1.4.1, many applications failed... I find the heap memory is not full , but the process of CoarseGrainedExecutorBackend will take more memory than I expect, and it will increase as time goes on, finally more than max limited of the server, the worker will die.....


Any can help?


Mode:standalone


spark.executor.memory 50g


25583 xiaoju    20   0 75.5g  55g  28m S 1729.3 88.1   2172:52 java


55g more than 50g I apply



-- 

-Barak








-- 

-Barak

Re: About memory leak in spark 1.4.1

Posted by Barak Gitsis <ba...@similarweb.com>.
spark uses a lot more than heap memory, it is the expected behavior.
in 1.4 off-heap memory usage is supposed to grow in comparison to 1.3

Better use as little memory as you can for heap, and since you are not
utilizing it already, it is safe for you to reduce it.
memoryFraction helps you optimize heap usage for your data/application
profile while keeping it tight.






On Sun, Aug 2, 2015 at 12:54 PM Sea <26...@qq.com> wrote:

> spark.storage.memoryFraction is in heap memory, but my situation is that
> the memory is more than heap memory !
>
> Anyone else use spark 1.4.1 in production?
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Ted Yu";<yu...@gmail.com>;
> *发送时间:* 2015年8月2日(星期天) 下午5:45
> *收件人:* "Sea"<26...@qq.com>;
> *抄送:* "Barak Gitsis"<ba...@similarweb.com>; "user@spark.apache.org"<
> user@spark.apache.org>; "rxin"<rx...@databricks.com>; "joshrosen"<
> joshrosen@databricks.com>; "davies"<da...@databricks.com>;
> *主题:* Re: About memory leak in spark 1.4.1
>
> http://spark.apache.org/docs/latest/tuning.html does mention spark.storage.memoryFraction
> in two places.
> One is under Cache Size Tuning section.
>
> FYI
>
> On Sun, Aug 2, 2015 at 2:16 AM, Sea <26...@qq.com> wrote:
>
>> Hi, Barak
>>     It is ok with spark 1.3.0, the problem is with spark 1.4.1.
>>     I don't think spark.storage.memoryFraction will make any sense,
>> because it is still in heap memory.
>>
>>
>> ------------------ 原始邮件 ------------------
>> *发件人:* "Barak Gitsis";<ba...@similarweb.com>;
>> *发送时间:* 2015年8月2日(星期天) 下午4:11
>> *收件人:* "Sea"<26...@qq.com>; "user"<us...@spark.apache.org>;
>> *抄送:* "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>;
>> "davies"<da...@databricks.com>;
>> *主题:* Re: About memory leak in spark 1.4.1
>>
>> Hi,
>> reducing spark.storage.memoryFraction did the trick for me. Heap doesn't
>> get filled because it is reserved..
>> My reasoning is:
>> I give executor all the memory i can give it, so that makes it a boundary
>> .
>> From here i try to make the best use of memory I can.
>> storage.memoryFraction is in a sense user data space.  The rest can be used
>> by the system.
>> If you don't have so much data that you MUST store in memory for
>> performance, better give spark more space..
>> ended up setting it to 0.3
>>
>> All that said, it is on spark 1.3 on cluster
>>
>> hope that helps
>>
>> On Sat, Aug 1, 2015 at 5:43 PM Sea <26...@qq.com> wrote:
>>
>>> Hi, all
>>> I upgrage spark to 1.4.1, many applications failed... I find the heap
>>> memory is not full , but the process of CoarseGrainedExecutorBackend will
>>> take more memory than I expect, and it will increase as time goes on,
>>> finally more than max limited of the server, the worker will die.....
>>>
>>> Any can help?
>>>
>>> Mode:standalone
>>>
>>> spark.executor.memory 50g
>>>
>>> 25583 xiaoju    20   0 75.5g  55g  28m S 1729.3 88.1   2172:52 java
>>>
>>> 55g more than 50g I apply
>>>
>>> --
>> *-Barak*
>>
>
> --
*-Barak*

Re: About memory leak in spark 1.4.1

Posted by Sea <26...@qq.com>.
spark.storage.memoryFraction is in heap memory, but my situation is that the memory is more than heap memory !  


Anyone else use spark 1.4.1 in production? 




------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年8月2日(星期天) 下午5:45
收件人: "Sea"<26...@qq.com>; 
抄送: "Barak Gitsis"<ba...@similarweb.com>; "user@spark.apache.org"<us...@spark.apache.org>; "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<da...@databricks.com>; 
主题: Re: About memory leak in spark 1.4.1



http://spark.apache.org/docs/latest/tuning.html does mention spark.storage.memoryFraction in two places.
One is under Cache Size Tuning section.


FYI


On Sun, Aug 2, 2015 at 2:16 AM, Sea <26...@qq.com> wrote:
Hi, Barak
    It is ok with spark 1.3.0, the problem is with spark 1.4.1.
    I don't think spark.storage.memoryFraction will make any sense, because it is still in heap memory. 




------------------ 原始邮件 ------------------
发件人: "Barak Gitsis";<ba...@similarweb.com>;
发送时间: 2015年8月2日(星期天) 下午4:11
收件人: "Sea"<26...@qq.com>; "user"<us...@spark.apache.org>; 
抄送: "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>; "davies"<da...@databricks.com>; 
主题: Re: About memory leak in spark 1.4.1



Hi,reducing spark.storage.memoryFraction did the trick for me. Heap doesn't get filled because it is reserved..
My reasoning is: 
I give executor all the memory i can give it, so that makes it a boundary.
From here i try to make the best use of memory I can. storage.memoryFraction is in a sense user data space.  The rest can be used by the system. 
If you don't have so much data that you MUST store in memory for performance, better give spark more space.. 
ended up setting it to 0.3


All that said, it is on spark 1.3 on cluster


hope that helps


On Sat, Aug 1, 2015 at 5:43 PM Sea <26...@qq.com> wrote:

Hi, all
I upgrage spark to 1.4.1, many applications failed... I find the heap memory is not full , but the process of CoarseGrainedExecutorBackend will take more memory than I expect, and it will increase as time goes on, finally more than max limited of the server, the worker will die.....


Any can help?


Mode:standalone


spark.executor.memory 50g


25583 xiaoju    20   0 75.5g  55g  28m S 1729.3 88.1   2172:52 java


55g more than 50g I apply



-- 

-Barak

Re: About memory leak in spark 1.4.1

Posted by Ted Yu <yu...@gmail.com>.
http://spark.apache.org/docs/latest/tuning.html does mention
spark.storage.memoryFraction
in two places.
One is under Cache Size Tuning section.

FYI

On Sun, Aug 2, 2015 at 2:16 AM, Sea <26...@qq.com> wrote:

> Hi, Barak
>     It is ok with spark 1.3.0, the problem is with spark 1.4.1.
>     I don't think spark.storage.memoryFraction will make any sense,
> because it is still in heap memory.
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Barak Gitsis";<ba...@similarweb.com>;
> *发送时间:* 2015年8月2日(星期天) 下午4:11
> *收件人:* "Sea"<26...@qq.com>; "user"<us...@spark.apache.org>;
> *抄送:* "rxin"<rx...@databricks.com>; "joshrosen"<jo...@databricks.com>;
> "davies"<da...@databricks.com>;
> *主题:* Re: About memory leak in spark 1.4.1
>
> Hi,
> reducing spark.storage.memoryFraction did the trick for me. Heap doesn't
> get filled because it is reserved..
> My reasoning is:
> I give executor all the memory i can give it, so that makes it a boundary.
> From here i try to make the best use of memory I can.
> storage.memoryFraction is in a sense user data space.  The rest can be used
> by the system.
> If you don't have so much data that you MUST store in memory for
> performance, better give spark more space..
> ended up setting it to 0.3
>
> All that said, it is on spark 1.3 on cluster
>
> hope that helps
>
> On Sat, Aug 1, 2015 at 5:43 PM Sea <26...@qq.com> wrote:
>
>> Hi, all
>> I upgrage spark to 1.4.1, many applications failed... I find the heap
>> memory is not full , but the process of CoarseGrainedExecutorBackend will
>> take more memory than I expect, and it will increase as time goes on,
>> finally more than max limited of the server, the worker will die.....
>>
>> Any can help?
>>
>> Mode:standalone
>>
>> spark.executor.memory 50g
>>
>> 25583 xiaoju    20   0 75.5g  55g  28m S 1729.3 88.1   2172:52 java
>>
>> 55g more than 50g I apply
>>
>> --
> *-Barak*
>