You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Kruse, Sebastian" <Se...@hpi.de> on 2015/12/09 09:46:37 UTC

Taskmanager memory

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark?. ?Tachyon seems to be nice solution to exchange data between them.


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ?a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian

Re: Taskmanager memory

Posted by Stephan Ewen <se...@apache.org>.
BTW, for 1.0, this is consolidated into one single mode...

On Wed, Dec 9, 2015 at 1:45 PM, Fabian Hueske <fh...@gmail.com> wrote:

> Yes, streaming mode supports batch jobs as well.
> The difference is that in streaming mode, managed memory is lazily
> allocated. This is because the streaming runtime does not use managed
> memory but only heap memory.
>
> 2015-12-09 11:55 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:
>
>> Thanks for your answers. So the problem with on-heap memory would be that
>> the JVM would not shrink its already allocated heap even if it is largely
>> unused?
>>
>> Pertaining to the streaming-mode: If I run Flink in that mode, can I
>> still submit batch jobs? Because that's what I want to do.
>>
>>
>> Thanks,
>>
>> Sebastian
>> ------------------------------
>> *From:* ewenstephan@gmail.com <ew...@gmail.com> on behalf of
>> Stephan Ewen <se...@apache.org>
>> *Sent:* Wednesday, December 9, 2015 11:15
>> *To:* user@flink.apache.org
>> *Subject:* Re: Taskmanager memory
>>
>> Off heap memory is freed when the memory consuming operators release the
>> memory.
>>
>> The Java process releases that memory then on the next GC, as far as I
>> know.
>>
>> On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <fh...@gmail.com> wrote:
>>
>>> Streaming mode with on-heap memory won't help because the JVM allocates
>>> all memory but doesn't convert it to managed memory internally, right?
>>>
>>> Is offheap memory actually freed after it has been allocated as managed
>>> memory? Does this happen after a job finishes?
>>>
>>> 2015-12-09 10:44 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>>
>>>> @Sebastian: Getting memory away from the JVM is tricky always,
>>>> completely independent of pre-allocation of managed memory or lazy
>>>> allocation.
>>>>
>>>> But here is something that may work:
>>>>   - Start Flink in streaming mode - that will make it allocate managed
>>>> memory lazily
>>>>   - Set the memory to offheap memory. That way the JVM heap is small.
>>>> The off-heap memory is returned when no longer used deallocated - this
>>>> releases memory much better than JVM shrinking the heap.
>>>>
>>>>
>>>>
>>>> On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <fh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Sebastian,
>>>>>
>>>>> There is no way to return memory from a Flink process except shutting
>>>>> the process down.
>>>>> I think YARN could help in your setup. In a YARN setup, you can
>>>>> flexibly start and stop Flink sessions with different configurations
>>>>> (memory, TMs, slots) or run a single job. When running a single job, Flink
>>>>> will allocate resources and free them after the job is done.
>>>>>
>>>>> Best, Fabian
>>>>>
>>>>> 2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>>
>>>>>> I am currently looking into how Flink can coexist and interoperate
>>>>>> with other frameworks in a cluster, such as plain single-machine processes
>>>>>> or Spark​. ​Tachyon seems to be nice solution to exchange data between
>>>>>> them.
>>>>>>
>>>>>>
>>>>>> However, I think it is a problem that Flink's taskmanagers allocate
>>>>>>  their managed memory upfront - in contrast to Spark, as far as I
>>>>>> know. If I want ​a taskmanager to yield its main memory, so that
>>>>>> another process can use that memory, is there any other option besides
>>>>>> shutting that taskmanager down? Would it be beneficial to use YARN?
>>>>>>
>>>>>> Thanks for your help!
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Sebastian
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Taskmanager memory

Posted by Fabian Hueske <fh...@gmail.com>.
Yes, streaming mode supports batch jobs as well.
The difference is that in streaming mode, managed memory is lazily
allocated. This is because the streaming runtime does not use managed
memory but only heap memory.

2015-12-09 11:55 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:

> Thanks for your answers. So the problem with on-heap memory would be that
> the JVM would not shrink its already allocated heap even if it is largely
> unused?
>
> Pertaining to the streaming-mode: If I run Flink in that mode, can I still
> submit batch jobs? Because that's what I want to do.
>
>
> Thanks,
>
> Sebastian
> ------------------------------
> *From:* ewenstephan@gmail.com <ew...@gmail.com> on behalf of
> Stephan Ewen <se...@apache.org>
> *Sent:* Wednesday, December 9, 2015 11:15
> *To:* user@flink.apache.org
> *Subject:* Re: Taskmanager memory
>
> Off heap memory is freed when the memory consuming operators release the
> memory.
>
> The Java process releases that memory then on the next GC, as far as I
> know.
>
> On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> Streaming mode with on-heap memory won't help because the JVM allocates
>> all memory but doesn't convert it to managed memory internally, right?
>>
>> Is offheap memory actually freed after it has been allocated as managed
>> memory? Does this happen after a job finishes?
>>
>> 2015-12-09 10:44 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>
>>> @Sebastian: Getting memory away from the JVM is tricky always,
>>> completely independent of pre-allocation of managed memory or lazy
>>> allocation.
>>>
>>> But here is something that may work:
>>>   - Start Flink in streaming mode - that will make it allocate managed
>>> memory lazily
>>>   - Set the memory to offheap memory. That way the JVM heap is small.
>>> The off-heap memory is returned when no longer used deallocated - this
>>> releases memory much better than JVM shrinking the heap.
>>>
>>>
>>>
>>> On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <fh...@gmail.com>
>>> wrote:
>>>
>>>> Hi Sebastian,
>>>>
>>>> There is no way to return memory from a Flink process except shutting
>>>> the process down.
>>>> I think YARN could help in your setup. In a YARN setup, you can
>>>> flexibly start and stop Flink sessions with different configurations
>>>> (memory, TMs, slots) or run a single job. When running a single job, Flink
>>>> will allocate resources and free them after the job is done.
>>>>
>>>> Best, Fabian
>>>>
>>>> 2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>>
>>>>> I am currently looking into how Flink can coexist and interoperate
>>>>> with other frameworks in a cluster, such as plain single-machine processes
>>>>> or Spark​. ​Tachyon seems to be nice solution to exchange data between
>>>>> them.
>>>>>
>>>>>
>>>>> However, I think it is a problem that Flink's taskmanagers allocate
>>>>>  their managed memory upfront - in contrast to Spark, as far as I know.
>>>>> If I want ​a taskmanager to yield its main memory, so that another
>>>>> process can use that memory, is there any other option besides shutting
>>>>> that taskmanager down? Would it be beneficial to use YARN?
>>>>>
>>>>> Thanks for your help!
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Sebastian
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Taskmanager memory

Posted by "Kruse, Sebastian" <Se...@hpi.de>.
Thanks for your answers. So the problem with on-heap memory would be that the JVM would not shrink its already allocated heap even if it is largely unused?

Pertaining to the streaming-mode: If I run Flink in that mode, can I still submit batch jobs? Because that's what I want to do.


Thanks,

Sebastian

________________________________
From: ewenstephan@gmail.com <ew...@gmail.com> on behalf of Stephan Ewen <se...@apache.org>
Sent: Wednesday, December 9, 2015 11:15
To: user@flink.apache.org
Subject: Re: Taskmanager memory

Off heap memory is freed when the memory consuming operators release the memory.

The Java process releases that memory then on the next GC, as far as I know.

On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <fh...@gmail.com>> wrote:
Streaming mode with on-heap memory won't help because the JVM allocates all memory but doesn't convert it to managed memory internally, right?

Is offheap memory actually freed after it has been allocated as managed memory? Does this happen after a job finishes?

2015-12-09 10:44 GMT+01:00 Stephan Ewen <se...@apache.org>>:
@Sebastian: Getting memory away from the JVM is tricky always, completely independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The off-heap memory is returned when no longer used deallocated - this releases memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <fh...@gmail.com>> wrote:
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly start and stop Flink sessions with different configurations (memory, TMs, slots) or run a single job. When running a single job, Flink will allocate resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>>:

Hi everyone,


I am currently looking into how Flink can coexist and interoperate with other frameworks in a cluster, such as plain single-machine processes or Spark?. ?Tachyon seems to be nice solution to exchange data between them.


However, I think it is a problem that Flink's taskmanagers allocate their managed memory upfront - in contrast to Spark, as far as I know. If I want ?a taskmanager to yield its main memory, so that another process can use that memory, is there any other option besides shutting that taskmanager down? Would it be beneficial to use YARN?

Thanks for your help!


Cheers,

Sebastian





Re: Taskmanager memory

Posted by Stephan Ewen <se...@apache.org>.
Off heap memory is freed when the memory consuming operators release the
memory.

The Java process releases that memory then on the next GC, as far as I know.

On Wed, Dec 9, 2015 at 11:01 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Streaming mode with on-heap memory won't help because the JVM allocates
> all memory but doesn't convert it to managed memory internally, right?
>
> Is offheap memory actually freed after it has been allocated as managed
> memory? Does this happen after a job finishes?
>
> 2015-12-09 10:44 GMT+01:00 Stephan Ewen <se...@apache.org>:
>
>> @Sebastian: Getting memory away from the JVM is tricky always, completely
>> independent of pre-allocation of managed memory or lazy allocation.
>>
>> But here is something that may work:
>>   - Start Flink in streaming mode - that will make it allocate managed
>> memory lazily
>>   - Set the memory to offheap memory. That way the JVM heap is small. The
>> off-heap memory is returned when no longer used deallocated - this releases
>> memory much better than JVM shrinking the heap.
>>
>>
>>
>> On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <fh...@gmail.com> wrote:
>>
>>> Hi Sebastian,
>>>
>>> There is no way to return memory from a Flink process except shutting
>>> the process down.
>>> I think YARN could help in your setup. In a YARN setup, you can flexibly
>>> start and stop Flink sessions with different configurations (memory, TMs,
>>> slots) or run a single job. When running a single job, Flink will allocate
>>> resources and free them after the job is done.
>>>
>>> Best, Fabian
>>>
>>> 2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:
>>>
>>>> Hi everyone,
>>>>
>>>>
>>>> I am currently looking into how Flink can coexist and interoperate with
>>>> other frameworks in a cluster, such as plain single-machine processes
>>>> or Spark​. ​Tachyon seems to be nice solution to exchange data between
>>>> them.
>>>>
>>>>
>>>> However, I think it is a problem that Flink's taskmanagers allocate
>>>>  their managed memory upfront - in contrast to Spark, as far as I know.
>>>> If I want ​a taskmanager to yield its main memory, so that another
>>>> process can use that memory, is there any other option besides shutting
>>>> that taskmanager down? Would it be beneficial to use YARN?
>>>>
>>>> Thanks for your help!
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Sebastian
>>>>
>>>
>>>
>>
>

Re: Taskmanager memory

Posted by Fabian Hueske <fh...@gmail.com>.
Streaming mode with on-heap memory won't help because the JVM allocates all
memory but doesn't convert it to managed memory internally, right?

Is offheap memory actually freed after it has been allocated as managed
memory? Does this happen after a job finishes?

2015-12-09 10:44 GMT+01:00 Stephan Ewen <se...@apache.org>:

> @Sebastian: Getting memory away from the JVM is tricky always, completely
> independent of pre-allocation of managed memory or lazy allocation.
>
> But here is something that may work:
>   - Start Flink in streaming mode - that will make it allocate managed
> memory lazily
>   - Set the memory to offheap memory. That way the JVM heap is small. The
> off-heap memory is returned when no longer used deallocated - this releases
> memory much better than JVM shrinking the heap.
>
>
>
> On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <fh...@gmail.com> wrote:
>
>> Hi Sebastian,
>>
>> There is no way to return memory from a Flink process except shutting the
>> process down.
>> I think YARN could help in your setup. In a YARN setup, you can flexibly
>> start and stop Flink sessions with different configurations (memory, TMs,
>> slots) or run a single job. When running a single job, Flink will allocate
>> resources and free them after the job is done.
>>
>> Best, Fabian
>>
>> 2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:
>>
>>> Hi everyone,
>>>
>>>
>>> I am currently looking into how Flink can coexist and interoperate with
>>> other frameworks in a cluster, such as plain single-machine processes
>>> or Spark​. ​Tachyon seems to be nice solution to exchange data between
>>> them.
>>>
>>>
>>> However, I think it is a problem that Flink's taskmanagers allocate
>>>  their managed memory upfront - in contrast to Spark, as far as I know.
>>> If I want ​a taskmanager to yield its main memory, so that another
>>> process can use that memory, is there any other option besides shutting
>>> that taskmanager down? Would it be beneficial to use YARN?
>>>
>>> Thanks for your help!
>>>
>>>
>>> Cheers,
>>>
>>> Sebastian
>>>
>>
>>
>

Re: Taskmanager memory

Posted by Stephan Ewen <se...@apache.org>.
@Sebastian: Getting memory away from the JVM is tricky always, completely
independent of pre-allocation of managed memory or lazy allocation.

But here is something that may work:
  - Start Flink in streaming mode - that will make it allocate managed
memory lazily
  - Set the memory to offheap memory. That way the JVM heap is small. The
off-heap memory is returned when no longer used deallocated - this releases
memory much better than JVM shrinking the heap.



On Wed, Dec 9, 2015 at 10:06 AM, Fabian Hueske <fh...@gmail.com> wrote:

> Hi Sebastian,
>
> There is no way to return memory from a Flink process except shutting the
> process down.
> I think YARN could help in your setup. In a YARN setup, you can flexibly
> start and stop Flink sessions with different configurations (memory, TMs,
> slots) or run a single job. When running a single job, Flink will allocate
> resources and free them after the job is done.
>
> Best, Fabian
>
> 2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:
>
>> Hi everyone,
>>
>>
>> I am currently looking into how Flink can coexist and interoperate with
>> other frameworks in a cluster, such as plain single-machine processes
>> or Spark​. ​Tachyon seems to be nice solution to exchange data between
>> them.
>>
>>
>> However, I think it is a problem that Flink's taskmanagers allocate their managed
>> memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager
>> to yield its main memory, so that another process can use that memory, is
>> there any other option besides shutting that taskmanager down? Would it be
>> beneficial to use YARN?
>>
>> Thanks for your help!
>>
>>
>> Cheers,
>>
>> Sebastian
>>
>
>

Re: Taskmanager memory

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Sebastian,

There is no way to return memory from a Flink process except shutting the
process down.
I think YARN could help in your setup. In a YARN setup, you can flexibly
start and stop Flink sessions with different configurations (memory, TMs,
slots) or run a single job. When running a single job, Flink will allocate
resources and free them after the job is done.

Best, Fabian

2015-12-09 9:46 GMT+01:00 Kruse, Sebastian <Se...@hpi.de>:

> Hi everyone,
>
>
> I am currently looking into how Flink can coexist and interoperate with
> other frameworks in a cluster, such as plain single-machine processes
> or Spark​. ​Tachyon seems to be nice solution to exchange data between
> them.
>
>
> However, I think it is a problem that Flink's taskmanagers allocate their managed
> memory upfront - in contrast to Spark, as far as I know. If I want ​a taskmanager
> to yield its main memory, so that another process can use that memory, is
> there any other option besides shutting that taskmanager down? Would it be
> beneficial to use YARN?
>
> Thanks for your help!
>
>
> Cheers,
>
> Sebastian
>