You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Benjamin Wulff <be...@ieee.org> on 2020/06/06 11:35:46 UTC

No offers are being made -- how to debug Mesos?

Hi all,

I’m in the process of setting up my first Mesos cluster with 1x master and 3x slaves on CentOS 8.

So far set up Zookeepr and Mesos-master on the master and Mesos-slave on one of the compute nodes. Mesos-master communicates with ZK and becomes leader. Then I started memos-slave on the compute node and can see in the log that it registers at the master with the correct resources reported. The agent and its resources are also displayed in the web UI of the master. So is the framework that I want to use.

The crux is that no tasks I schedule in the framework are executed. And I suppose this is because the framework never receives an offer. I can see in the web UI that no offers are made and that all resources remain idle.

Now, I’m new to Mesos and I don’t really have an idea how to debug my setup at this point. 

There is a page called ‘Debugging with the new CLI’ in the documentation but it only explains how to configure  the CLI command. 

Any directions how to debug in my situation in general or on how to use the CLI for debugging would be highly welcome! :)

Thanks and best regards,
Ben


Re: No offers are being made -- how to debug Mesos?

Posted by Benjamin Wulff <be...@ieee.org>.
Hi Benjamin,

> 
> I can't quite tell from the log snippet you provided. Assuming this is the only scheduler registered, it should receive offers for all the agents for the scheduler's roles (in this case, should just be the '*' role).
> 

The framework I was talking about is the only framework in place. I can confirm that the role of the framework is ‘*’ (framework’s page in the web ui).

> Some reasons offers might not be sent:
> 
> -Framework doesn't have capability required to be offered the agent (e.g. scheduler doesn't have GPU_RESOURCES when the agent has GPUs).

The ‘capabilities’ field in the framework information (http://mster:5050/framework/<id> <http://mster:5050/framework/%3Cid%3E>) is empty. I have attached the framework information json.  

> -Framework suppressed its role(s) (doesn't seem to be the case from the log snippet)
> -The role has insufficient quota (e.g. if you have set a quota limit for that role, or if other roles have quota guarantees overcommitting the cluster)

I have not (knowingly) set any quotas.

> -The agent's resources are reserved to a role.

I have not (knowingly) made any resource reservations. 

Generally: What I have done is installed Zookeeper and Mesos, did basic configuration of master and agent and started, in the that order: Zookeeper, Memos-Master, Mesos-Slave, the framework. There was nothing more going on. 

> 
> Can you show us the scheduler code? Can you give us complete logs, along with results of the agent and master /state endpoints?
> 
I have attached the two state info json files and the framework info son as well as the logs from master and agent.

Thanks and best regards,
Ben 



> 
> On Sat, Jun 6, 2020 at 8:00 AM Benjamin Wulff <benjamin.wulff.de@ieee.org <ma...@ieee.org>> wrote:
> So with logging_level set to INFO (and master and slave restarted) I noticed in /var/log/mesos.INFO on the agent the following line:
> 
> I0606 13:46:41.393455 206117 slave.cpp:4222] Ignoring info update for framework 2777de92-bc91-4e48-9960-bbab05694665-0000 because it does not exist
> 
> That is indeed the ID of the framework I’d like to run my task. In the web UI the framework is listed. So why is the agent saying that it doesn’t exist? What is the semantic of this message?
> 
> On the master in /var/log/mesos-master.INFO the last relevant log lines (after that comes HTTP requests) are:
> 
> I0606 13:50:11.710996 52025 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 172.30.0.8:41378 <http://172.30.0.8:41378/> with User-Agent='python-requests/2.23.0'
> I0606 13:50:11.720903 52025 master.cpp:2670] Received subscription request for HTTP framework 'Go-Docker'
> I0606 13:50:11.721153 52025 master.cpp:2742] Subscribing framework 'Go-Docker' with checkpointing disabled and capabilities [  ]
> I0606 13:50:11.721993 52025 master.cpp:10847] Adding framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
> I0606 13:50:11.722084 52025 master.cpp:8300] Updating framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
> I0606 13:50:11.722514 52030 hierarchical.cpp:605] Added framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722573 52030 hierarchical.cpp:711] Deactivated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722625 52030 hierarchical.cpp:1552] Suppressed offers for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722657 52030 hierarchical.cpp:1592] Unsuppressed offers and cleared filters for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722703 52030 hierarchical.cpp:681] Activated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> 
> From my novice perspective it seems the framework is registered..
> 
> Thanks,
> Ben
> 
> 
> > On 6. Jun 2020, at 13:40, Marc Roos <M.Roos@f1-outsourcing.eu <ma...@f1-outsourcing.eu>> wrote:
> > 
> > 
> > 
> > 
> > You already put these on debug?
> > 
> > [@ ]# cat /etc/mesos-master/logging_level
> > WARNING
> > [@ ]# cat /etc/mesos-slave/logging_level
> > WARNING
> > 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Benjamin Wulff [mailto:benjamin.wulff.de@ieee.org <ma...@ieee.org>] 
> > Sent: zaterdag 6 juni 2020 13:36
> > To: user@mesos.apache.org <ma...@mesos.apache.org>
> > Subject: No offers are being made -- how to debug Mesos?
> > 
> > Hi all,
> > 
> > I’m in the process of setting up my first Mesos cluster with 1x master 
> > and 3x slaves on CentOS 8.
> > 
> > So far set up Zookeepr and Mesos-master on the master and Mesos-slave on 
> > one of the compute nodes. Mesos-master communicates with ZK and becomes 
> > leader. Then I started memos-slave on the compute node and can see in 
> > the log that it registers at the master with the correct resources 
> > reported. The agent and its resources are also displayed in the web UI 
> > of the master. So is the framework that I want to use.
> > 
> > The crux is that no tasks I schedule in the framework are executed. And 
> > I suppose this is because the framework never receives an offer. I can 
> > see in the web UI that no offers are made and that all resources remain 
> > idle.
> > 
> > Now, I’m new to Mesos and I don’t really have an idea how to debug my 
> > setup at this point. 
> > 
> > There is a page called ‘Debugging with the new CLI’ in the 
> > documentation but it only explains how to configure  the CLI command. 
> > 
> > Any directions how to debug in my situation in general or on how to use 
> > the CLI for debugging would be highly welcome! :)
> > 
> > Thanks and best regards,
> > Ben
> > 
> > 
> > 
> 


Re: No offers are being made -- how to debug Mesos?

Posted by Benjamin Wulff <be...@ieee.org>.
Turns out I had to configure the framework I desire to use to do exactly what the mess-execute command did, adding GPU_RESOURCES to the capability list. Now resources are offered to the framework and tasks are run. :)

Thanks,
Ben


> On 7. Jun 2020, at 15:01, Benjamin Wulff <be...@ieee.org> wrote:
> 
> Hi all,
> 
> I found the gnu-support site in the docs (1) and tried the following command:
> 
> # mesos-execute --master=129.26.78.161:5050 --name=gpu-test --command="nvidia-smi" --framework_capabilities="GPU_RESOURCES" --resources="gpus:1”
> 
> ..and that gave the following output:
> 
> I0607 14:57:41.897706 56361 scheduler.cpp:189] Version: 1.9.0
> I0607 14:57:41.913520 56361 scheduler.cpp:342] Using default 'basic' HTTP authenticatee
> I0607 14:57:41.913813 56367 scheduler.cpp:525] New master detected at master@129.26.78.161 <ma...@129.26.78.161>:5050
> Subscribed with ID f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
> Submitted task 'gpu-test' to agent 'f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0'
> Received status update TASK_STARTING for task 'gpu-test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_RUNNING for task 'gpu-test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'gpu-test'
>   message: 'Command exited with status 0'
>   source: SOURCE_EXECUTOR
> 
> I did not see the output of nvidia-smi as I should have according to the documentation.
> 
> I have attached the logs of master and agent.
> 
> Thanks,
> Ben
> <mesos-master.log.INFO.txt>
> <mesos-slave.node-01.root.log.INFO.2.log>
> 
> 
>> On 7. Jun 2020, at 02:43, Benjamin Mahler <bmahler@apache.org <ma...@apache.org>> wrote:
>> 
>> Don't worry about that "Ignoring" message on the agent. When the framework information is updated, the master broadcasts it to the agents, and in this case the agent doesn't know about the framework since it has no tasks for it, and so it ignores the updated information.
>> 
>> I can't quite tell from the log snippet you provided. Assuming this is the only scheduler registered, it should receive offers for all the agents for the scheduler's roles (in this case, should just be the '*' role).
>> 
>> Some reasons offers might not be sent:
>> 
>> -Framework doesn't have capability required to be offered the agent (e.g. scheduler doesn't have GPU_RESOURCES when the agent has GPUs).
>> -Framework suppressed its role(s) (doesn't seem to be the case from the log snippet)
>> -The role has insufficient quota (e.g. if you have set a quota limit for that role, or if other roles have quota guarantees overcommitting the cluster)
>> -The agent's resources are reserved to a role.
>> 
>> Can you show us the scheduler code? Can you give us complete logs, along with results of the agent and master /state endpoints?
>> 
>> 
>> On Sat, Jun 6, 2020 at 8:00 AM Benjamin Wulff <benjamin.wulff.de@ieee.org <ma...@ieee.org>> wrote:
>> So with logging_level set to INFO (and master and slave restarted) I noticed in /var/log/mesos.INFO on the agent the following line:
>> 
>> I0606 13:46:41.393455 206117 slave.cpp:4222] Ignoring info update for framework 2777de92-bc91-4e48-9960-bbab05694665-0000 because it does not exist
>> 
>> That is indeed the ID of the framework I’d like to run my task. In the web UI the framework is listed. So why is the agent saying that it doesn’t exist? What is the semantic of this message?
>> 
>> On the master in /var/log/mesos-master.INFO the last relevant log lines (after that comes HTTP requests) are:
>> 
>> I0606 13:50:11.710996 52025 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 172.30.0.8:41378 <http://172.30.0.8:41378/> with User-Agent='python-requests/2.23.0'
>> I0606 13:50:11.720903 52025 master.cpp:2670] Received subscription request for HTTP framework 'Go-Docker'
>> I0606 13:50:11.721153 52025 master.cpp:2742] Subscribing framework 'Go-Docker' with checkpointing disabled and capabilities [  ]
>> I0606 13:50:11.721993 52025 master.cpp:10847] Adding framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
>> I0606 13:50:11.722084 52025 master.cpp:8300] Updating framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
>> I0606 13:50:11.722514 52030 hierarchical.cpp:605] Added framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722573 52030 hierarchical.cpp:711] Deactivated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722625 52030 hierarchical.cpp:1552] Suppressed offers for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722657 52030 hierarchical.cpp:1592] Unsuppressed offers and cleared filters for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722703 52030 hierarchical.cpp:681] Activated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> 
>> From my novice perspective it seems the framework is registered..
>> 
>> Thanks,
>> Ben
>> 
>> 
>> > On 6. Jun 2020, at 13:40, Marc Roos <M.Roos@f1-outsourcing.eu <ma...@f1-outsourcing.eu>> wrote:
>> > 
>> > 
>> > 
>> > 
>> > You already put these on debug?
>> > 
>> > [@ ]# cat /etc/mesos-master/logging_level
>> > WARNING
>> > [@ ]# cat /etc/mesos-slave/logging_level
>> > WARNING
>> > 
>> > 
>> > 
>> > 
>> > -----Original Message-----
>> > From: Benjamin Wulff [mailto:benjamin.wulff.de@ieee.org <ma...@ieee.org>] 
>> > Sent: zaterdag 6 juni 2020 13:36
>> > To: user@mesos.apache.org <ma...@mesos.apache.org>
>> > Subject: No offers are being made -- how to debug Mesos?
>> > 
>> > Hi all,
>> > 
>> > I’m in the process of setting up my first Mesos cluster with 1x master 
>> > and 3x slaves on CentOS 8.
>> > 
>> > So far set up Zookeepr and Mesos-master on the master and Mesos-slave on 
>> > one of the compute nodes. Mesos-master communicates with ZK and becomes 
>> > leader. Then I started memos-slave on the compute node and can see in 
>> > the log that it registers at the master with the correct resources 
>> > reported. The agent and its resources are also displayed in the web UI 
>> > of the master. So is the framework that I want to use.
>> > 
>> > The crux is that no tasks I schedule in the framework are executed. And 
>> > I suppose this is because the framework never receives an offer. I can 
>> > see in the web UI that no offers are made and that all resources remain 
>> > idle.
>> > 
>> > Now, I’m new to Mesos and I don’t really have an idea how to debug my 
>> > setup at this point. 
>> > 
>> > There is a page called ‘Debugging with the new CLI’ in the 
>> > documentation but it only explains how to configure  the CLI command. 
>> > 
>> > Any directions how to debug in my situation in general or on how to use 
>> > the CLI for debugging would be highly welcome! :)
>> > 
>> > Thanks and best regards,
>> > Ben
>> > 
>> > 
>> > 
>> 
> 


Re: No offers are being made -- how to debug Mesos?

Posted by Benjamin Wulff <be...@ieee.org>.
Hi all,

a correction:

I saw the correct output of nvidia-smi in the stdout file in the tasks work dir on the agent (that was the piece I didn’t get, reading helps!).

So I have to see why the framework doesn’t receive any offers.

Thanks,
Ben
 

> On 7. Jun 2020, at 15:01, Benjamin Wulff <be...@ieee.org> wrote:
> 
> Hi all,
> 
> I found the gnu-support site in the docs (1) and tried the following command:
> 
> # mesos-execute --master=129.26.78.161:5050 --name=gpu-test --command="nvidia-smi" --framework_capabilities="GPU_RESOURCES" --resources="gpus:1”
> 
> ..and that gave the following output:
> 
> I0607 14:57:41.897706 56361 scheduler.cpp:189] Version: 1.9.0
> I0607 14:57:41.913520 56361 scheduler.cpp:342] Using default 'basic' HTTP authenticatee
> I0607 14:57:41.913813 56367 scheduler.cpp:525] New master detected at master@129.26.78.161 <ma...@129.26.78.161>:5050
> Subscribed with ID f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
> Submitted task 'gpu-test' to agent 'f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0'
> Received status update TASK_STARTING for task 'gpu-test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_RUNNING for task 'gpu-test'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'gpu-test'
>   message: 'Command exited with status 0'
>   source: SOURCE_EXECUTOR
> 
> I did not see the output of nvidia-smi as I should have according to the documentation.
> 
> I have attached the logs of master and agent.
> 
> Thanks,
> Ben
> <mesos-master.log.INFO.txt>
> <mesos-slave.node-01.root.log.INFO.2.log>
> 
> 
>> On 7. Jun 2020, at 02:43, Benjamin Mahler <bmahler@apache.org <ma...@apache.org>> wrote:
>> 
>> Don't worry about that "Ignoring" message on the agent. When the framework information is updated, the master broadcasts it to the agents, and in this case the agent doesn't know about the framework since it has no tasks for it, and so it ignores the updated information.
>> 
>> I can't quite tell from the log snippet you provided. Assuming this is the only scheduler registered, it should receive offers for all the agents for the scheduler's roles (in this case, should just be the '*' role).
>> 
>> Some reasons offers might not be sent:
>> 
>> -Framework doesn't have capability required to be offered the agent (e.g. scheduler doesn't have GPU_RESOURCES when the agent has GPUs).
>> -Framework suppressed its role(s) (doesn't seem to be the case from the log snippet)
>> -The role has insufficient quota (e.g. if you have set a quota limit for that role, or if other roles have quota guarantees overcommitting the cluster)
>> -The agent's resources are reserved to a role.
>> 
>> Can you show us the scheduler code? Can you give us complete logs, along with results of the agent and master /state endpoints?
>> 
>> 
>> On Sat, Jun 6, 2020 at 8:00 AM Benjamin Wulff <benjamin.wulff.de@ieee.org <ma...@ieee.org>> wrote:
>> So with logging_level set to INFO (and master and slave restarted) I noticed in /var/log/mesos.INFO on the agent the following line:
>> 
>> I0606 13:46:41.393455 206117 slave.cpp:4222] Ignoring info update for framework 2777de92-bc91-4e48-9960-bbab05694665-0000 because it does not exist
>> 
>> That is indeed the ID of the framework I’d like to run my task. In the web UI the framework is listed. So why is the agent saying that it doesn’t exist? What is the semantic of this message?
>> 
>> On the master in /var/log/mesos-master.INFO the last relevant log lines (after that comes HTTP requests) are:
>> 
>> I0606 13:50:11.710996 52025 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 172.30.0.8:41378 <http://172.30.0.8:41378/> with User-Agent='python-requests/2.23.0'
>> I0606 13:50:11.720903 52025 master.cpp:2670] Received subscription request for HTTP framework 'Go-Docker'
>> I0606 13:50:11.721153 52025 master.cpp:2742] Subscribing framework 'Go-Docker' with checkpointing disabled and capabilities [  ]
>> I0606 13:50:11.721993 52025 master.cpp:10847] Adding framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
>> I0606 13:50:11.722084 52025 master.cpp:8300] Updating framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
>> I0606 13:50:11.722514 52030 hierarchical.cpp:605] Added framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722573 52030 hierarchical.cpp:711] Deactivated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722625 52030 hierarchical.cpp:1552] Suppressed offers for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722657 52030 hierarchical.cpp:1592] Unsuppressed offers and cleared filters for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> I0606 13:50:11.722703 52030 hierarchical.cpp:681] Activated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
>> 
>> From my novice perspective it seems the framework is registered..
>> 
>> Thanks,
>> Ben
>> 
>> 
>> > On 6. Jun 2020, at 13:40, Marc Roos <M.Roos@f1-outsourcing.eu <ma...@f1-outsourcing.eu>> wrote:
>> > 
>> > 
>> > 
>> > 
>> > You already put these on debug?
>> > 
>> > [@ ]# cat /etc/mesos-master/logging_level
>> > WARNING
>> > [@ ]# cat /etc/mesos-slave/logging_level
>> > WARNING
>> > 
>> > 
>> > 
>> > 
>> > -----Original Message-----
>> > From: Benjamin Wulff [mailto:benjamin.wulff.de@ieee.org <ma...@ieee.org>] 
>> > Sent: zaterdag 6 juni 2020 13:36
>> > To: user@mesos.apache.org <ma...@mesos.apache.org>
>> > Subject: No offers are being made -- how to debug Mesos?
>> > 
>> > Hi all,
>> > 
>> > I’m in the process of setting up my first Mesos cluster with 1x master 
>> > and 3x slaves on CentOS 8.
>> > 
>> > So far set up Zookeepr and Mesos-master on the master and Mesos-slave on 
>> > one of the compute nodes. Mesos-master communicates with ZK and becomes 
>> > leader. Then I started memos-slave on the compute node and can see in 
>> > the log that it registers at the master with the correct resources 
>> > reported. The agent and its resources are also displayed in the web UI 
>> > of the master. So is the framework that I want to use.
>> > 
>> > The crux is that no tasks I schedule in the framework are executed. And 
>> > I suppose this is because the framework never receives an offer. I can 
>> > see in the web UI that no offers are made and that all resources remain 
>> > idle.
>> > 
>> > Now, I’m new to Mesos and I don’t really have an idea how to debug my 
>> > setup at this point. 
>> > 
>> > There is a page called ‘Debugging with the new CLI’ in the 
>> > documentation but it only explains how to configure  the CLI command. 
>> > 
>> > Any directions how to debug in my situation in general or on how to use 
>> > the CLI for debugging would be highly welcome! :)
>> > 
>> > Thanks and best regards,
>> > Ben
>> > 
>> > 
>> > 
>> 
> 


Re: No offers are being made -- how to debug Mesos?

Posted by Benjamin Wulff <be...@ieee.org>.
Hi all,

I found the gnu-support site in the docs (1) and tried the following command:

# mesos-execute --master=129.26.78.161:5050 --name=gpu-test --command="nvidia-smi" --framework_capabilities="GPU_RESOURCES" --resources="gpus:1”

..and that gave the following output:

I0607 14:57:41.897706 56361 scheduler.cpp:189] Version: 1.9.0
I0607 14:57:41.913520 56361 scheduler.cpp:342] Using default 'basic' HTTP authenticatee
I0607 14:57:41.913813 56367 scheduler.cpp:525] New master detected at master@129.26.78.161:5050
Subscribed with ID f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-0005
Submitted task 'gpu-test' to agent 'f2e21b9a-3bb6-4f40-bfe3-3ec4f8eda64a-S0'
Received status update TASK_STARTING for task 'gpu-test'
  source: SOURCE_EXECUTOR
Received status update TASK_RUNNING for task 'gpu-test'
  source: SOURCE_EXECUTOR
Received status update TASK_FINISHED for task 'gpu-test'
  message: 'Command exited with status 0'
  source: SOURCE_EXECUTOR

I did not see the output of nvidia-smi as I should have according to the documentation.

I have attached the logs of master and agent.

Thanks,
Ben



> On 7. Jun 2020, at 02:43, Benjamin Mahler <bm...@apache.org> wrote:
> 
> Don't worry about that "Ignoring" message on the agent. When the framework information is updated, the master broadcasts it to the agents, and in this case the agent doesn't know about the framework since it has no tasks for it, and so it ignores the updated information.
> 
> I can't quite tell from the log snippet you provided. Assuming this is the only scheduler registered, it should receive offers for all the agents for the scheduler's roles (in this case, should just be the '*' role).
> 
> Some reasons offers might not be sent:
> 
> -Framework doesn't have capability required to be offered the agent (e.g. scheduler doesn't have GPU_RESOURCES when the agent has GPUs).
> -Framework suppressed its role(s) (doesn't seem to be the case from the log snippet)
> -The role has insufficient quota (e.g. if you have set a quota limit for that role, or if other roles have quota guarantees overcommitting the cluster)
> -The agent's resources are reserved to a role.
> 
> Can you show us the scheduler code? Can you give us complete logs, along with results of the agent and master /state endpoints?
> 
> 
> On Sat, Jun 6, 2020 at 8:00 AM Benjamin Wulff <benjamin.wulff.de@ieee.org <ma...@ieee.org>> wrote:
> So with logging_level set to INFO (and master and slave restarted) I noticed in /var/log/mesos.INFO on the agent the following line:
> 
> I0606 13:46:41.393455 206117 slave.cpp:4222] Ignoring info update for framework 2777de92-bc91-4e48-9960-bbab05694665-0000 because it does not exist
> 
> That is indeed the ID of the framework I’d like to run my task. In the web UI the framework is listed. So why is the agent saying that it doesn’t exist? What is the semantic of this message?
> 
> On the master in /var/log/mesos-master.INFO the last relevant log lines (after that comes HTTP requests) are:
> 
> I0606 13:50:11.710996 52025 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 172.30.0.8:41378 <http://172.30.0.8:41378/> with User-Agent='python-requests/2.23.0'
> I0606 13:50:11.720903 52025 master.cpp:2670] Received subscription request for HTTP framework 'Go-Docker'
> I0606 13:50:11.721153 52025 master.cpp:2742] Subscribing framework 'Go-Docker' with checkpointing disabled and capabilities [  ]
> I0606 13:50:11.721993 52025 master.cpp:10847] Adding framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
> I0606 13:50:11.722084 52025 master.cpp:8300] Updating framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
> I0606 13:50:11.722514 52030 hierarchical.cpp:605] Added framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722573 52030 hierarchical.cpp:711] Deactivated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722625 52030 hierarchical.cpp:1552] Suppressed offers for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722657 52030 hierarchical.cpp:1592] Unsuppressed offers and cleared filters for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722703 52030 hierarchical.cpp:681] Activated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> 
> From my novice perspective it seems the framework is registered..
> 
> Thanks,
> Ben
> 
> 
> > On 6. Jun 2020, at 13:40, Marc Roos <M.Roos@f1-outsourcing.eu <ma...@f1-outsourcing.eu>> wrote:
> > 
> > 
> > 
> > 
> > You already put these on debug?
> > 
> > [@ ]# cat /etc/mesos-master/logging_level
> > WARNING
> > [@ ]# cat /etc/mesos-slave/logging_level
> > WARNING
> > 
> > 
> > 
> > 
> > -----Original Message-----
> > From: Benjamin Wulff [mailto:benjamin.wulff.de@ieee.org <ma...@ieee.org>] 
> > Sent: zaterdag 6 juni 2020 13:36
> > To: user@mesos.apache.org <ma...@mesos.apache.org>
> > Subject: No offers are being made -- how to debug Mesos?
> > 
> > Hi all,
> > 
> > I’m in the process of setting up my first Mesos cluster with 1x master 
> > and 3x slaves on CentOS 8.
> > 
> > So far set up Zookeepr and Mesos-master on the master and Mesos-slave on 
> > one of the compute nodes. Mesos-master communicates with ZK and becomes 
> > leader. Then I started memos-slave on the compute node and can see in 
> > the log that it registers at the master with the correct resources 
> > reported. The agent and its resources are also displayed in the web UI 
> > of the master. So is the framework that I want to use.
> > 
> > The crux is that no tasks I schedule in the framework are executed. And 
> > I suppose this is because the framework never receives an offer. I can 
> > see in the web UI that no offers are made and that all resources remain 
> > idle.
> > 
> > Now, I’m new to Mesos and I don’t really have an idea how to debug my 
> > setup at this point. 
> > 
> > There is a page called ‘Debugging with the new CLI’ in the 
> > documentation but it only explains how to configure  the CLI command. 
> > 
> > Any directions how to debug in my situation in general or on how to use 
> > the CLI for debugging would be highly welcome! :)
> > 
> > Thanks and best regards,
> > Ben
> > 
> > 
> > 
> 


Re: No offers are being made -- how to debug Mesos?

Posted by Benjamin Mahler <bm...@apache.org>.
Don't worry about that "Ignoring" message on the agent. When the framework
information is updated, the master broadcasts it to the agents, and in this
case the agent doesn't know about the framework since it has no tasks for
it, and so it ignores the updated information.

I can't quite tell from the log snippet you provided. Assuming this is the
only scheduler registered, it should receive offers for all the agents for
the scheduler's roles (in this case, should just be the '*' role).

Some reasons offers might not be sent:

-Framework doesn't have capability required to be offered the agent (e.g.
scheduler doesn't have GPU_RESOURCES when the agent has GPUs).
-Framework suppressed its role(s) (doesn't seem to be the case from the log
snippet)
-The role has insufficient quota (e.g. if you have set a quota limit for
that role, or if other roles have quota guarantees overcommitting the
cluster)
-The agent's resources are reserved to a role.

Can you show us the scheduler code? Can you give us complete logs, along
with results of the agent and master /state endpoints?


On Sat, Jun 6, 2020 at 8:00 AM Benjamin Wulff <be...@ieee.org>
wrote:

> So with logging_level set to INFO (and master and slave restarted) I
> noticed in /var/log/mesos.INFO on the agent the following line:
>
> I0606 13:46:41.393455 206117 slave.cpp:4222] Ignoring info update for
> framework 2777de92-bc91-4e48-9960-bbab05694665-0000 because it does not
> exist
>
> That is indeed the ID of the framework I’d like to run my task. In the web
> UI the framework is listed. So why is the agent saying that it doesn’t
> exist? What is the semantic of this message?
>
> On the master in /var/log/mesos-master.INFO the last relevant log lines
> (after that comes HTTP requests) are:
>
> I0606 13:50:11.710996 52025 http.cpp:1115] HTTP POST for
> /master/api/v1/scheduler from 172.30.0.8:41378 with
> User-Agent='python-requests/2.23.0'
> I0606 13:50:11.720903 52025 master.cpp:2670] Received subscription request
> for HTTP framework 'Go-Docker'
> I0606 13:50:11.721153 52025 master.cpp:2742] Subscribing framework
> 'Go-Docker' with checkpointing disabled and capabilities [  ]
> I0606 13:50:11.721993 52025 master.cpp:10847] Adding framework
> 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  }
> suppressed
> I0606 13:50:11.722084 52025 master.cpp:8300] Updating framework
> 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  }
> suppressed
> I0606 13:50:11.722514 52030 hierarchical.cpp:605] Added framework
> 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722573 52030 hierarchical.cpp:711] Deactivated framework
> 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722625 52030 hierarchical.cpp:1552] Suppressed offers for
> roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722657 52030 hierarchical.cpp:1592] Unsuppressed offers and
> cleared filters for roles {  } of framework
> 2777de92-bc91-4e48-9960-bbab05694665-0000
> I0606 13:50:11.722703 52030 hierarchical.cpp:681] Activated framework
> 2777de92-bc91-4e48-9960-bbab05694665-0000
>
> From my novice perspective it seems the framework is registered..
>
> Thanks,
> Ben
>
>
> > On 6. Jun 2020, at 13:40, Marc Roos <M....@f1-outsourcing.eu> wrote:
> >
> >
> >
> >
> > You already put these on debug?
> >
> > [@ ]# cat /etc/mesos-master/logging_level
> > WARNING
> > [@ ]# cat /etc/mesos-slave/logging_level
> > WARNING
> >
> >
> >
> >
> > -----Original Message-----
> > From: Benjamin Wulff [mailto:benjamin.wulff.de@ieee.org]
> > Sent: zaterdag 6 juni 2020 13:36
> > To: user@mesos.apache.org
> > Subject: No offers are being made -- how to debug Mesos?
> >
> > Hi all,
> >
> > I’m in the process of setting up my first Mesos cluster with 1x master
> > and 3x slaves on CentOS 8.
> >
> > So far set up Zookeepr and Mesos-master on the master and Mesos-slave on
> > one of the compute nodes. Mesos-master communicates with ZK and becomes
> > leader. Then I started memos-slave on the compute node and can see in
> > the log that it registers at the master with the correct resources
> > reported. The agent and its resources are also displayed in the web UI
> > of the master. So is the framework that I want to use.
> >
> > The crux is that no tasks I schedule in the framework are executed. And
> > I suppose this is because the framework never receives an offer. I can
> > see in the web UI that no offers are made and that all resources remain
> > idle.
> >
> > Now, I’m new to Mesos and I don’t really have an idea how to debug my
> > setup at this point.
> >
> > There is a page called ‘Debugging with the new CLI’ in the
> > documentation but it only explains how to configure  the CLI command.
> >
> > Any directions how to debug in my situation in general or on how to use
> > the CLI for debugging would be highly welcome! :)
> >
> > Thanks and best regards,
> > Ben
> >
> >
> >
>
>

Re: No offers are being made -- how to debug Mesos?

Posted by Benjamin Wulff <be...@ieee.org>.
So with logging_level set to INFO (and master and slave restarted) I noticed in /var/log/mesos.INFO on the agent the following line:

I0606 13:46:41.393455 206117 slave.cpp:4222] Ignoring info update for framework 2777de92-bc91-4e48-9960-bbab05694665-0000 because it does not exist

That is indeed the ID of the framework I’d like to run my task. In the web UI the framework is listed. So why is the agent saying that it doesn’t exist? What is the semantic of this message?

On the master in /var/log/mesos-master.INFO the last relevant log lines (after that comes HTTP requests) are:

I0606 13:50:11.710996 52025 http.cpp:1115] HTTP POST for /master/api/v1/scheduler from 172.30.0.8:41378 with User-Agent='python-requests/2.23.0'
I0606 13:50:11.720903 52025 master.cpp:2670] Received subscription request for HTTP framework 'Go-Docker'
I0606 13:50:11.721153 52025 master.cpp:2742] Subscribing framework 'Go-Docker' with checkpointing disabled and capabilities [  ]
I0606 13:50:11.721993 52025 master.cpp:10847] Adding framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
I0606 13:50:11.722084 52025 master.cpp:8300] Updating framework 2777de92-bc91-4e48-9960-bbab05694665-0000 (Go-Docker) with roles {  } suppressed
I0606 13:50:11.722514 52030 hierarchical.cpp:605] Added framework 2777de92-bc91-4e48-9960-bbab05694665-0000
I0606 13:50:11.722573 52030 hierarchical.cpp:711] Deactivated framework 2777de92-bc91-4e48-9960-bbab05694665-0000
I0606 13:50:11.722625 52030 hierarchical.cpp:1552] Suppressed offers for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
I0606 13:50:11.722657 52030 hierarchical.cpp:1592] Unsuppressed offers and cleared filters for roles {  } of framework 2777de92-bc91-4e48-9960-bbab05694665-0000
I0606 13:50:11.722703 52030 hierarchical.cpp:681] Activated framework 2777de92-bc91-4e48-9960-bbab05694665-0000

From my novice perspective it seems the framework is registered..

Thanks,
Ben


> On 6. Jun 2020, at 13:40, Marc Roos <M....@f1-outsourcing.eu> wrote:
> 
> 
> 
> 
> You already put these on debug?
> 
> [@ ]# cat /etc/mesos-master/logging_level
> WARNING
> [@ ]# cat /etc/mesos-slave/logging_level
> WARNING
> 
> 
> 
> 
> -----Original Message-----
> From: Benjamin Wulff [mailto:benjamin.wulff.de@ieee.org] 
> Sent: zaterdag 6 juni 2020 13:36
> To: user@mesos.apache.org
> Subject: No offers are being made -- how to debug Mesos?
> 
> Hi all,
> 
> I’m in the process of setting up my first Mesos cluster with 1x master 
> and 3x slaves on CentOS 8.
> 
> So far set up Zookeepr and Mesos-master on the master and Mesos-slave on 
> one of the compute nodes. Mesos-master communicates with ZK and becomes 
> leader. Then I started memos-slave on the compute node and can see in 
> the log that it registers at the master with the correct resources 
> reported. The agent and its resources are also displayed in the web UI 
> of the master. So is the framework that I want to use.
> 
> The crux is that no tasks I schedule in the framework are executed. And 
> I suppose this is because the framework never receives an offer. I can 
> see in the web UI that no offers are made and that all resources remain 
> idle.
> 
> Now, I’m new to Mesos and I don’t really have an idea how to debug my 
> setup at this point. 
> 
> There is a page called ‘Debugging with the new CLI’ in the 
> documentation but it only explains how to configure  the CLI command. 
> 
> Any directions how to debug in my situation in general or on how to use 
> the CLI for debugging would be highly welcome! :)
> 
> Thanks and best regards,
> Ben
> 
> 
> 


RE: No offers are being made -- how to debug Mesos?

Posted by Marc Roos <M....@f1-outsourcing.eu>.
 

 
You already put these on debug?

[@ ]# cat /etc/mesos-master/logging_level
WARNING
[@ ]# cat /etc/mesos-slave/logging_level
WARNING




-----Original Message-----
From: Benjamin Wulff [mailto:benjamin.wulff.de@ieee.org] 
Sent: zaterdag 6 juni 2020 13:36
To: user@mesos.apache.org
Subject: No offers are being made -- how to debug Mesos?

Hi all,

I’m in the process of setting up my first Mesos cluster with 1x master 
and 3x slaves on CentOS 8.

So far set up Zookeepr and Mesos-master on the master and Mesos-slave on 
one of the compute nodes. Mesos-master communicates with ZK and becomes 
leader. Then I started memos-slave on the compute node and can see in 
the log that it registers at the master with the correct resources 
reported. The agent and its resources are also displayed in the web UI 
of the master. So is the framework that I want to use.

The crux is that no tasks I schedule in the framework are executed. And 
I suppose this is because the framework never receives an offer. I can 
see in the web UI that no offers are made and that all resources remain 
idle.

Now, I’m new to Mesos and I don’t really have an idea how to debug my 
setup at this point. 

There is a page called ‘Debugging with the new CLI’ in the 
documentation but it only explains how to configure  the CLI command. 

Any directions how to debug in my situation in general or on how to use 
the CLI for debugging would be highly welcome! :)

Thanks and best regards,
Ben