You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Francisco Gonzalez Barea <Fr...@piksel.com> on 2017/09/19 15:58:44 UTC

Problem in Flink 1.3.2 with Mesos task managers offers

Hello guys,

We have a flink 1.3.2 session deployed from Marathon json to Mesos with some of the following parameters as environment variables:


"flink_mesos.initial-tasks": "8",
"flink_mesos.resourcemanager.tasks.mem": "4096",

And other environment variables including zookeeper, etc.

The mesos cluster is used for diferents applications (kafka, ad-hoc...), and have fragmentation into the agents. Our problem is that the flink session is getting all offers, even small ones. In case there are not enough offers to suit that configuration, it gets all of them, so there are no resources and offers free for other applications.

So the question would be what is the right configuration in these cases to avoid using all resources for the same flink session.

Thanks in advance.
Regards

This message is private and confidential. If you have received this message in error, please notify the sender or servicedesk@piksel.com and remove it from your system.

Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road SE, Suite 400, Atlanta, GA 30339

Re: Problem in Flink 1.3.2 with Mesos task managers offers

Posted by Francisco Gonzalez Barea <Fr...@piksel.com>.
Hello Eron,

Thank you for your reply, we will take a look at this.

Regards


On 19 Sep 2017, at 22:37, Eron Wright <er...@gmail.com>> wrote:

Hello, the current behavior is that Flink holds onto received offers for up to two minutes while it attempts to provision the TMs.   Flink can combine small offers to form a single TM, to combat fragmentation that develops over time in a Mesos cluster.   Are you saying that unused offers aren't being released after two minutes?

There's a log entry you should see in the JM log whenever an offer is released:
LOG.info<http://LOG.info>(s"Declined offer ${lease.getId} from ${lease.hostname()} "
  + s"of ${lease.memoryMB()} MB, ${lease.cpuCores()} cpus.")

The timeout value isn't configurable at the moment, but if you're willing to experiment by building Flink from source, you may adjust the two minute timeout to something lower as follows.   In the `MesosFlinkResourceManager` class, edit the `createOptimizer` method to call `withLeaseOfferExpirySecs` on the `TaskScheduler.Builder` object.

Let us know if that helps and we'll make the timeout configurable.
-Eron

On Tue, Sep 19, 2017 at 8:58 AM, Francisco Gonzalez Barea <Fr...@piksel.com>> wrote:
Hello guys,

We have a flink 1.3.2 session deployed from Marathon json to Mesos with some of the following parameters as environment variables:


"flink_mesos.initial-tasks": "8",
"flink_mesos.resourcemanager.tasks.mem": "4096",

And other environment variables including zookeeper, etc.

The mesos cluster is used for diferents applications (kafka, ad-hoc...), and have fragmentation into the agents. Our problem is that the flink session is getting all offers, even small ones. In case there are not enough offers to suit that configuration, it gets all of them, so there are no resources and offers free for other applications.

So the question would be what is the right configuration in these cases to avoid using all resources for the same flink session.

Thanks in advance.
Regards

This message is private and confidential. If you have received this message in error, please notify the sender or servicedesk@piksel.com<ma...@piksel.com> and remove it from your system.

Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road SE, Suite 400, Atlanta, GA 30339<https://maps.google.com/?q=2100+Powers+Ferry+Road+SE,+Suite+400,+Atlanta,+GA+30339&entry=gmail&source=g>



Re: Problem in Flink 1.3.2 with Mesos task managers offers

Posted by Eron Wright <er...@gmail.com>.
Hello, the current behavior is that Flink holds onto received offers for up
to two minutes while it attempts to provision the TMs.   Flink can combine
small offers to form a single TM, to combat fragmentation that develops
over time in a Mesos cluster.   Are you saying that unused offers aren't
being released after two minutes?

There's a log entry you should see in the JM log whenever an offer is
released:
LOG.info(s"Declined offer ${lease.getId} from ${lease.hostname()} "
  + s"of ${lease.memoryMB()} MB, ${lease.cpuCores()} cpus.")

The timeout value isn't configurable at the moment, but if you're willing
to experiment by building Flink from source, you may adjust the two minute
timeout to something lower as follows.   In the `MesosFlinkResourceManager`
class, edit the `createOptimizer` method to call `withLeaseOfferExpirySecs`
on the `TaskScheduler.Builder` object.

Let us know if that helps and we'll make the timeout configurable.
-Eron

On Tue, Sep 19, 2017 at 8:58 AM, Francisco Gonzalez Barea <
Francisco.Gonzalez@piksel.com> wrote:

> Hello guys,
>
> We have a flink 1.3.2 session deployed from Marathon json to Mesos with
> some of the following parameters as environment variables:
>
>
> *"flink_mesos.initial-tasks": "8",*
> *"flink_mesos.resourcemanager.tasks.mem": "4096",*
>
>
> And other environment variables including zookeeper, etc.
>
> The mesos cluster is used for diferents applications (kafka, ad-hoc...),
> and have fragmentation into the agents. Our problem is that the flink
> session is getting all offers, even small ones. In case there are not
> enough offers to suit that configuration, it gets all of them, so there are
> no resources and offers free for other applications.
>
> So the question would be what is the right configuration in these cases to
> avoid using all resources for the same flink session.
>
> Thanks in advance.
> Regards
>
> This message is private and confidential. If you have received this
> message in error, please notify the sender or servicedesk@piksel.com and
> remove it from your system.
>
> Piksel Inc is a company registered in the United States, 2100 Powers
> Ferry Road SE, Suite 400, Atlanta, GA 30339
> <https://maps.google.com/?q=2100+Powers+Ferry+Road+SE,+Suite+400,+Atlanta,+GA+30339&entry=gmail&source=g>
>