You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Bo Yu <yu...@gmail.com> on 2017/10/02 12:49:25 UTC

Fwd: Consult about flink on mesos cluster

Hello all,
This is Bo, I met some problems when I tried to use flink in my mesos
cluster (1 master, 2 slaves (cpu has 32 cores)).
I tried to start the mesos-appmaster.sh in marathon, the job manager is
started without problem.

mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024
-Dtaskmanager.numberOfTaskSlots=32

My problem is the task managers are all located in one single slave.
1. (log1)
The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted as
"mesos.initial-tasks: 2"
And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27",
which is the master node of mesos cluster.

2. (log2)
I tried many ways to distribute the tasks to all the available slaves, and
without any success.
So I decide to try add a group_by operator which I referenced from
https://mesosphere.github.io/marathon/docs/constraints.html
"mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
According to the log, flink keep waiting for more offers and the tasks
never been launched.

Sorry, I am a newbie to flink, also on mesos. Please reply if my problem is
not clear, and I will be appreciate on any hint about how to distribute
task evenly on available resources.

Thank you in advance.

Best regards,

Bo

Re: Consult about flink on mesos cluster

Posted by Till Rohrmann <tr...@apache.org>.

Hi Bo,

I think the by saying mesos.constraings.hard.hostattribute:
rack:ak03-07,rack:ak16-10, you define two hard constraints which are
attribute rack must equal ak03-07 AND rack must equal ak16-10. Since a task
offer must come from both racks, it will never complete a task request. So
at the moment it is only possible to fix a given Mesos attribute to a
single value.

Cheers,
Till


On Tue, Oct 10, 2017 at 3:45 AM, Bo Yu <yu...@gmail.com> wrote:

> Thanks, Till
>
> I tried to set hard host attribute constraints in "flink-conf.yaml" as
> mesos.constraints.hard.hostattribute: rack:ak03-07,rack:ak16-10,
> rack:ak03-04
> where "rack:akXX-XX" is the MESOS_attributes of each slave.
>
> Then I get to the situation that the mesos app master doesn't accept the
> offers to start the task manager.
> I keep get the log as: flink.log
>
> The task manager doesn't start properly even though there're sufficient
> resources..
>
> Thank you in advance, and looking forward for your advices.
>
> Best regards,
>
> Bo
>
> On Tue, Oct 10, 2017 at 12:12 AM, Till Rohrmann <tr...@apache.org>
> wrote:
>
>> Hi Bo,
>>
>> you can still use Flink with Marathon, because Marathon will only
>> schedule the cluster entrypoint which is the MesosApplicationMasterRunner.
>> Everything else will be scheduled via Fenzo. Moreover, by using Marathon
>> you gain high availability because Marathon makes sure that the
>> ApplicationMaster is restarted in case of a failure.
>>
>> Cheers,
>> Till
>>
>> On Mon, Oct 9, 2017 at 2:59 PM, yubo <yu...@gmail.com> wrote:
>>
>>> Thanks for your reply, Till
>>> We will use without Marathon, and hope the PR is merged to latest
>>> version soon.
>>>
>>> Best regards,
>>> Bo
>>>
>>> On Oct 9, 29 Heisei, at 6:36 PM, Till Rohrmann <tr...@apache.org>
>>> wrote:
>>>
>>> Hi Bo,
>>>
>>> Flink uses internally Fenzo to match tasks and offers. Fenzo does not
>>> support the Marathon constraints syntax you are referring to. At the
>>> moment, Flink only allows to define hard host attribute constraints which
>>> means that you define a host attribute which has to match exactly. Fenzo
>>> also supports constraints that work on a set of tasks [1], but this is not
>>> yet exposed to the user. With that you should be able to evenly spread your
>>> tasks across multiple machines.
>>>
>>> There is actually a PR [2] trying to add this functionality. However, it
>>> is not yet in the shape to be merged.
>>>
>>> [1] https://github.com/Netflix/Fenzo/wiki/Constraints#constr
>>> aints-that-operate-on-groups-of-tasks
>>> [2] https://github.com/apache/flink/pull/4628
>>>
>>> Cheers,
>>> Till
>>>
>>> On Fri, Oct 6, 2017 at 10:54 AM, Tzu-Li (Gordon) Tai <
>>> tzulitai@apache.org> wrote:
>>>
>>>> Hi Bo,
>>>>
>>>> I'm not familiar with Mesos deployments, but I'll forward this to Till
>>>> or Eron (in CC) who perhaps could provide some help here.
>>>>
>>>> Cheers,
>>>> Gordon
>>>>
>>>>
>>>> On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1983@gmail.com) wrote:
>>>>
>>>> Hello all,
>>>> This is Bo, I met some problems when I tried to use flink in my mesos
>>>> cluster (1 master, 2 slaves (cpu has 32 cores)).
>>>> I tried to start the mesos-appmaster.sh in marathon, the job manager is
>>>> started without problem.
>>>>
>>>> mesos-appmaster.sh -Djobmanager.heap.mb=1024
>>>> -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=32
>>>>
>>>> My problem is the task managers are all located in one single slave.
>>>> 1. (log1)
>>>> The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted
>>>> as "mesos.initial-tasks: 2"
>>>> And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27",
>>>> which is the master node of mesos cluster.
>>>>
>>>> 2. (log2)
>>>> I tried many ways to distribute the tasks to all the available slaves,
>>>> and without any success.
>>>> So I decide to try add a group_by operator which I referenced from
>>>> https://mesosphere.github.io/marathon/docs/constraints.html
>>>> "mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
>>>> According to the log, flink keep waiting for more offers and the tasks
>>>> never been launched.
>>>>
>>>> Sorry, I am a newbie to flink, also on mesos. Please reply if my
>>>> problem is not clear, and I will be appreciate on any hint about how to
>>>> distribute task evenly on available resources.
>>>>
>>>> Thank you in advance.
>>>>
>>>> Best regards,
>>>>
>>>> Bo
>>>>
>>>>
>>>
>>>
>>
>

Re: Consult about flink on mesos cluster

Posted by Bo Yu <yu...@gmail.com>.

Thanks, Till

I tried to set hard host attribute constraints in "flink-conf.yaml" as
mesos.constraints.hard.hostattribute: rack:ak03-07,rack:ak16-10,rack:ak03-04
where "rack:akXX-XX" is the MESOS_attributes of each slave.

Then I get to the situation that the mesos app master doesn't accept the
offers to start the task manager.
I keep get the log as: flink.log

The task manager doesn't start properly even though there're sufficient
resources..

Thank you in advance, and looking forward for your advices.

Best regards,

Bo

On Tue, Oct 10, 2017 at 12:12 AM, Till Rohrmann <tr...@apache.org>
wrote:

> Hi Bo,
>
> you can still use Flink with Marathon, because Marathon will only schedule
> the cluster entrypoint which is the MesosApplicationMasterRunner.
> Everything else will be scheduled via Fenzo. Moreover, by using Marathon
> you gain high availability because Marathon makes sure that the
> ApplicationMaster is restarted in case of a failure.
>
> Cheers,
> Till
>
> On Mon, Oct 9, 2017 at 2:59 PM, yubo <yu...@gmail.com> wrote:
>
>> Thanks for your reply, Till
>> We will use without Marathon, and hope the PR is merged to latest version
>> soon.
>>
>> Best regards,
>> Bo
>>
>> On Oct 9, 29 Heisei, at 6:36 PM, Till Rohrmann <tr...@apache.org>
>> wrote:
>>
>> Hi Bo,
>>
>> Flink uses internally Fenzo to match tasks and offers. Fenzo does not
>> support the Marathon constraints syntax you are referring to. At the
>> moment, Flink only allows to define hard host attribute constraints which
>> means that you define a host attribute which has to match exactly. Fenzo
>> also supports constraints that work on a set of tasks [1], but this is not
>> yet exposed to the user. With that you should be able to evenly spread your
>> tasks across multiple machines.
>>
>> There is actually a PR [2] trying to add this functionality. However, it
>> is not yet in the shape to be merged.
>>
>> [1] https://github.com/Netflix/Fenzo/wiki/Constraints#
>> constraints-that-operate-on-groups-of-tasks
>> [2] https://github.com/apache/flink/pull/4628
>>
>> Cheers,
>> Till
>>
>> On Fri, Oct 6, 2017 at 10:54 AM, Tzu-Li (Gordon) Tai <tzulitai@apache.org
>> > wrote:
>>
>>> Hi Bo,
>>>
>>> I'm not familiar with Mesos deployments, but I'll forward this to Till
>>> or Eron (in CC) who perhaps could provide some help here.
>>>
>>> Cheers,
>>> Gordon
>>>
>>>
>>> On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1983@gmail.com) wrote:
>>>
>>> Hello all,
>>> This is Bo, I met some problems when I tried to use flink in my mesos
>>> cluster (1 master, 2 slaves (cpu has 32 cores)).
>>> I tried to start the mesos-appmaster.sh in marathon, the job manager is
>>> started without problem.
>>>
>>> mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024
>>> -Dtaskmanager.numberOfTaskSlots=32
>>>
>>> My problem is the task managers are all located in one single slave.
>>> 1. (log1)
>>> The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted
>>> as "mesos.initial-tasks: 2"
>>> And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27",
>>> which is the master node of mesos cluster.
>>>
>>> 2. (log2)
>>> I tried many ways to distribute the tasks to all the available slaves,
>>> and without any success.
>>> So I decide to try add a group_by operator which I referenced from
>>> https://mesosphere.github.io/marathon/docs/constraints.html
>>> "mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
>>> According to the log, flink keep waiting for more offers and the tasks
>>> never been launched.
>>>
>>> Sorry, I am a newbie to flink, also on mesos. Please reply if my problem
>>> is not clear, and I will be appreciate on any hint about how to distribute
>>> task evenly on available resources.
>>>
>>> Thank you in advance.
>>>
>>> Best regards,
>>>
>>> Bo
>>>
>>>
>>
>>
>

Re: Consult about flink on mesos cluster

Posted by Till Rohrmann <tr...@apache.org>.

Hi Bo,

you can still use Flink with Marathon, because Marathon will only schedule
the cluster entrypoint which is the MesosApplicationMasterRunner.
Everything else will be scheduled via Fenzo. Moreover, by using Marathon
you gain high availability because Marathon makes sure that the
ApplicationMaster is restarted in case of a failure.

Cheers,
Till

On Mon, Oct 9, 2017 at 2:59 PM, yubo <yu...@gmail.com> wrote:

> Thanks for your reply, Till
> We will use without Marathon, and hope the PR is merged to latest version
> soon.
>
> Best regards,
> Bo
>
> On Oct 9, 29 Heisei, at 6:36 PM, Till Rohrmann <tr...@apache.org>
> wrote:
>
> Hi Bo,
>
> Flink uses internally Fenzo to match tasks and offers. Fenzo does not
> support the Marathon constraints syntax you are referring to. At the
> moment, Flink only allows to define hard host attribute constraints which
> means that you define a host attribute which has to match exactly. Fenzo
> also supports constraints that work on a set of tasks [1], but this is not
> yet exposed to the user. With that you should be able to evenly spread your
> tasks across multiple machines.
>
> There is actually a PR [2] trying to add this functionality. However, it
> is not yet in the shape to be merged.
>
> [1] https://github.com/Netflix/Fenzo/wiki/Constraints#constraints-that-
> operate-on-groups-of-tasks
> [2] https://github.com/apache/flink/pull/4628
>
> Cheers,
> Till
>
> On Fri, Oct 6, 2017 at 10:54 AM, Tzu-Li (Gordon) Tai <tz...@apache.org>
> wrote:
>
>> Hi Bo,
>>
>> I'm not familiar with Mesos deployments, but I'll forward this to Till or
>> Eron (in CC) who perhaps could provide some help here.
>>
>> Cheers,
>> Gordon
>>
>>
>> On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1983@gmail.com) wrote:
>>
>> Hello all,
>> This is Bo, I met some problems when I tried to use flink in my mesos
>> cluster (1 master, 2 slaves (cpu has 32 cores)).
>> I tried to start the mesos-appmaster.sh in marathon, the job manager is
>> started without problem.
>>
>> mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024
>> -Dtaskmanager.numberOfTaskSlots=32
>>
>> My problem is the task managers are all located in one single slave.
>> 1. (log1)
>> The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted
>> as "mesos.initial-tasks: 2"
>> And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27",
>> which is the master node of mesos cluster.
>>
>> 2. (log2)
>> I tried many ways to distribute the tasks to all the available slaves,
>> and without any success.
>> So I decide to try add a group_by operator which I referenced from
>> https://mesosphere.github.io/marathon/docs/constraints.html
>> "mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
>> According to the log, flink keep waiting for more offers and the tasks
>> never been launched.
>>
>> Sorry, I am a newbie to flink, also on mesos. Please reply if my problem
>> is not clear, and I will be appreciate on any hint about how to distribute
>> task evenly on available resources.
>>
>> Thank you in advance.
>>
>> Best regards,
>>
>> Bo
>>
>>
>
>

Re: Consult about flink on mesos cluster

Posted by yubo <yu...@gmail.com>.

Thanks for your reply, Till
We will use without Marathon, and hope the PR is merged to latest version soon.
 
Best regards,
Bo

> On Oct 9, 29 Heisei, at 6:36 PM, Till Rohrmann <tr...@apache.org> wrote:
> 
> Hi Bo,
> 
> Flink uses internally Fenzo to match tasks and offers. Fenzo does not support the Marathon constraints syntax you are referring to. At the moment, Flink only allows to define hard host attribute constraints which means that you define a host attribute which has to match exactly. Fenzo also supports constraints that work on a set of tasks [1], but this is not yet exposed to the user. With that you should be able to evenly spread your tasks across multiple machines.
> 
> There is actually a PR [2] trying to add this functionality. However, it is not yet in the shape to be merged.
> 
> [1] https://github.com/Netflix/Fenzo/wiki/Constraints#constraints-that-operate-on-groups-of-tasks <https://github.com/Netflix/Fenzo/wiki/Constraints#constraints-that-operate-on-groups-of-tasks>
> [2] https://github.com/apache/flink/pull/4628 <https://github.com/apache/flink/pull/4628>
> 
> Cheers,
> Till
> 
> On Fri, Oct 6, 2017 at 10:54 AM, Tzu-Li (Gordon) Tai <tzulitai@apache.org <ma...@apache.org>> wrote:
> Hi Bo,
> 
> I'm not familiar with Mesos deployments, but I'll forward this to Till or Eron (in CC) who perhaps could provide some help here.
> 
> Cheers,
> Gordon
> 
> 
> On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1983@gmail.com <ma...@gmail.com>) wrote:
> 
>> Hello all,
>> This is Bo, I met some problems when I tried to use flink in my mesos cluster (1 master, 2 slaves (cpu has 32 cores)).
>> I tried to start the mesos-appmaster.sh in marathon, the job manager is started without problem.
>> 
>> mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=32
>> 
>> My problem is the task managers are all located in one single slave.
>> 1. (log1)
>> The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted as "mesos.initial-tasks: 2"
>> And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27", which is the master node of mesos cluster.
>> 
>> 2. (log2)
>> I tried many ways to distribute the tasks to all the available slaves, and without any success.
>> So I decide to try add a group_by operator which I referenced from https://mesosphere.github.io/marathon/docs/constraints.html <https://mesosphere.github.io/marathon/docs/constraints.html>
>> "mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
>> According to the log, flink keep waiting for more offers and the tasks never been launched.
>> 
>> Sorry, I am a newbie to flink, also on mesos. Please reply if my problem is not clear, and I will be appreciate on any hint about how to distribute task evenly on available resources.
>> 
>> Thank you in advance.
>> 
>> Best regards,
>> 
>> Bo
>> 
>

Re: Fwd: Consult about flink on mesos cluster

Posted by Till Rohrmann <tr...@apache.org>.

Hi Bo,

Flink uses internally Fenzo to match tasks and offers. Fenzo does not
support the Marathon constraints syntax you are referring to. At the
moment, Flink only allows to define hard host attribute constraints which
means that you define a host attribute which has to match exactly. Fenzo
also supports constraints that work on a set of tasks [1], but this is not
yet exposed to the user. With that you should be able to evenly spread your
tasks across multiple machines.

There is actually a PR [2] trying to add this functionality. However, it is
not yet in the shape to be merged.

[1]
https://github.com/Netflix/Fenzo/wiki/Constraints#constraints-that-operate-on-groups-of-tasks
[2] https://github.com/apache/flink/pull/4628

Cheers,
Till

On Fri, Oct 6, 2017 at 10:54 AM, Tzu-Li (Gordon) Tai <tz...@apache.org>
wrote:

> Hi Bo,
>
> I'm not familiar with Mesos deployments, but I'll forward this to Till or
> Eron (in CC) who perhaps could provide some help here.
>
> Cheers,
> Gordon
>
>
> On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1983@gmail.com) wrote:
>
> Hello all,
> This is Bo, I met some problems when I tried to use flink in my mesos
> cluster (1 master, 2 slaves (cpu has 32 cores)).
> I tried to start the mesos-appmaster.sh in marathon, the job manager is
> started without problem.
>
> mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024
> -Dtaskmanager.numberOfTaskSlots=32
>
> My problem is the task managers are all located in one single slave.
> 1. (log1)
> The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted as
> "mesos.initial-tasks: 2"
> And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27",
> which is the master node of mesos cluster.
>
> 2. (log2)
> I tried many ways to distribute the tasks to all the available slaves, and
> without any success.
> So I decide to try add a group_by operator which I referenced from
> https://mesosphere.github.io/marathon/docs/constraints.html
> "mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
> According to the log, flink keep waiting for more offers and the tasks
> never been launched.
>
> Sorry, I am a newbie to flink, also on mesos. Please reply if my problem
> is not clear, and I will be appreciate on any hint about how to distribute
> task evenly on available resources.
>
> Thank you in advance.
>
> Best regards,
>
> Bo
>
>

Re: Fwd: Consult about flink on mesos cluster

Posted by "Tzu-Li (Gordon) Tai" <tz...@apache.org>.

Hi Bo,

I'm not familiar with Mesos deployments, but I'll forward this to Till or Eron (in CC) who perhaps could provide some help here.

Cheers,
Gordon


On 2 October 2017 at 8:49:32 PM, Bo Yu (yubo1983@gmail.com) wrote:

Hello all,
This is Bo, I met some problems when I tried to use flink in my mesos cluster (1 master, 2 slaves (cpu has 32 cores)).
I tried to start the mesos-appmaster.sh in marathon, the job manager is started without problem.

mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=32

My problem is the task managers are all located in one single slave.
1. (log1)
The initial tasks in "/usr/local/flink/conf/flink-conf.yaml" is setted as "mesos.initial-tasks: 2"
And also set the "mesos.constraints.hard.hostattribute: rack:ak09-27", which is the master node of mesos cluster.

2. (log2)
I tried many ways to distribute the tasks to all the available slaves, and without any success.
So I decide to try add a group_by operator which I referenced from https://mesosphere.github.io/marathon/docs/constraints.html
"mesos.constraints.hard.hostattribute: rack:ak09-27,GROUP_BY:2"
According to the log, flink keep waiting for more offers and the tasks never been launched.

Sorry, I am a newbie to flink, also on mesos. Please reply if my problem is not clear, and I will be appreciate on any hint about how to distribute task evenly on available resources.

Thank you in advance.

Best regards,

Bo