You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Whitney Sorenson <ws...@hubspot.com> on 2013/11/07 19:07:25 UTC

Jenkins mesos plugin failing

Hi all!

I am trying to get the Jenkins Mesos plugin functioning. I was able to get
it installed on our Jenkins master.

However, it's unclear if there are any required steps for setting up the
slaves. When a framework task is launched, it fails instantly and there are
no logs in the runs folder.

Here's a gist with relevant logs from the slave:

https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs

Any help on how to debug? At first, I thought maybe we needed slave.jar or
something but it looks like it's trying to fetch that from the master using
the URIs. To clarify, I have done no special jenkins related setup (as per
readme.md) on any of the slaves.

-Whitney

Re: Jenkins mesos plugin failing

Posted by Ray Rodriguez <ra...@gmail.com>.
The logs that really helped me sort out what was happening where the
jenkins logs so you may want to check those first.  Also when your slave is
trying to run the jenkins job you should check to see if it's actually able
to start the slave.jar java process.  Looks something like this:

sh -c java -DHUDSON_HOME=jenkins -server -Xmx640m -Xms16m
-XX:+UseConcMarkSweepGC -Djava.net.preferIPv4Stack=true -jar slave.jar
 -jnlpUrl
http://ec2-67-123-38-123.compute-1.amazonaws.com:8080/computer/mesos-jenkins-fb1421f4-8a6b-490c-ae43-4ed7da01c02d/slave-agent.jnlp

Another thing to check is whether your slaves can communicate back to your
jenkins node.  Try to curl the URI from your slave node.


On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Hi all!
>
> I am trying to get the Jenkins Mesos plugin functioning. I was able to get
> it installed on our Jenkins master.
>
> However, it's unclear if there are any required steps for setting up the
> slaves. When a framework task is launched, it fails instantly and there are
> no logs in the runs folder.
>
> Here's a gist with relevant logs from the slave:
>
>
> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>
> Any help on how to debug? At first, I thought maybe we needed slave.jar or
> something but it looks like it's trying to fetch that from the master using
> the URIs. To clarify, I have done no special jenkins related setup (as per
> readme.md) on any of the slaves.
>
> -Whitney
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
I was debugging a problem with $HOME not being set when the executor runs
the jenkins build (and thus git config --global fails) so I was building a
new version of the plugin. I believe it started happening at some point
when I redeployed the plugin, but I also switched Jenkins masters at some
point as well. I didn't have two concurrent versions of the plugin running
at the same mesos master.

However, the problem seems to be occurring if I point to either of 2
separate mesos clusters.




On Fri, Nov 8, 2013 at 4:57 PM, Vinod Kone <vi...@gmail.com> wrote:

> In your earlier email you were able to launch jenkins slaves on the mesos
> cluster. What changed?
>
> Did the problem start happening when you tried to run two different
> instances of Jenkins masters each connect to the mesos master as a
> different framework?
>
>
> On Fri, Nov 8, 2013 at 1:45 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> I added the following environment to Jenkins so I could capture the
>> framework logs:
>>
>>  GLOG_log_dir/var/log/jenkins GLOG_logtostderr0 GLOG_v 3
>> However, I'm not sure I've seen anything of value, shown here:
>>
>> https://gist.github.com/wsorenson/30bd131a70aa602105d1
>>
>> I've completely stopped/started the mesos cluster, and restarted jenkins
>> multiple times.
>>
>> Not sure what to try next - I've seen the exact same symptoms with a
>> different framework on a different cluster. In that situation, fully
>> stopping and bringing up the mesos cluster again seemed to resolve the
>> issue.
>>
>>
>>
>>
>>
>> On Thu, Nov 7, 2013 at 8:23 PM, Benjamin Mahler <
>> benjamin.mahler@gmail.com> wrote:
>>
>>> From the master's perspective, the framework disconnected immediately
>>> after registering.
>>>
>>> You can bump up the logging on the jenkins scheduler by ensuring that
>>> GLOG_v=3 is in your environment when our plugin is initialized.
>>>
>>> On Thu, Nov 7, 2013 at 3:17 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>>
>>>> Sure (https://github.com/jenkinsci/mesos-plugin/issues/4) but I'm
>>>> actually running into another issue which I've seen before with other
>>>> frameworks:
>>>>
>>>> I added the plugin to a separate Jenkins cluster and the framework
>>>> doesn't seem to be able to maintain the connection successfully.
>>>>
>>>> The jenkins master log shows:
>>>>
>>>> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.MesosCloud <init>
>>>> INFO: Mesos master changed, restarting the scheduler
>>>> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.JenkinsScheduler
>>>> <init>
>>>> INFO: JenkinsScheduler instantiated with jenkins
>>>> http://jenkins-master/jenkins/ and mesos mesos-master:5050
>>>>
>>>> With nothing else (no confirmation that the framework registered.)
>>>>
>>>> In the mesos UI, I see that the framework is constantly failing /
>>>> registering. The logs show:
>>>>
>>>> I1107 22:53:06.791082 4283 master.cpp:1365] Framework failover timeout,
>>>> removing framework 201310222354-1872141066-5050-4282-2992 I1107
>>>> 22:53:06.791760 4283 master.cpp:2022] Removing framework
>>>> 201310222354-1872141066-5050-4282-2992 I1107 22:53:06.792107 4283
>>>> hierarchical_allocator_process.hpp:352] Removed framework
>>>> 201310222354-1872141066-5050-4282-2992 I1107 22:53:07.788573 4286
>>>> master.cpp:695] Registering framework
>>>> 201310222354-1872141066-5050-4282-2993 at scheduler(1)@
>>>> 10.46.101.33:58478 I1107 22:53:07.788938 4286
>>>> hierarchical_allocator_process.hpp:321] Added framework
>>>> 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790592 4286
>>>> master.cpp:1448] Sending 1 offers to framework
>>>> 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790864 4284
>>>> master.cpp:489] Framework 201310222354-1872141066-5050-4282-2993
>>>> disconnected I1107 22:53:07.791007 4284 master.cpp:516] Giving framework
>>>> 201310222354-1872141066-5050-4282-2993 0ns to failover I1107
>>>> 22:53:07.791052 4285 hierarchical_allocator_process.hpp:397] Deactivated
>>>> framework 201310222354-1872141066-5050-4282-2993
>>>>
>>>> This loop continues forever, happening several times per second.
>>>>
>>>> Any guidance on how to troubleshoot (I've already checked into network)
>>>> or way to increase logging threshold on master?
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 5:10 PM, Benjamin Mahler <
>>>> benjamin.mahler@gmail.com> wrote:
>>>>
>>>>> We should fix that so that it reconnects with Mesos after a restart of
>>>>> Jenkins!
>>>>>
>>>>> Can you file an issue for this?
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 12:31 PM, Whitney Sorenson <
>>>>> wsorenson@hubspot.com> wrote:
>>>>>
>>>>>> I should also point out the scheduler didn't seem to survive a reboot
>>>>>> of Jenkins - I had to delete the mesos cloud and reenter the parameters.
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <
>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>
>>>>>>> Looks like we're using authentication on our slaves. So you either
>>>>>>> need to pass
>>>>>>>
>>>>>>> -jnlpCredentials user:pass
>>>>>>>
>>>>>>> on the command line, or change around the permissions in Jenkins to
>>>>>>> allow anonymous users to connect/run jobs.
>>>>>>>
>>>>>>> I'm not sure if it would make sense or not to add the user/pass in
>>>>>>> the Jenkins plugin configuration screen or if it should be fetched another
>>>>>>> way.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Great. Let us know once you figure it out. Maybe I can add a FAQ to
>>>>>>>> the plugin's README to help others (or you can contribute too :)).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <
>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>
>>>>>>>>> I added the jenkins user on the slave - this was the missing
>>>>>>>>> piece. I'll add this to my PR for the readme. Got much further now; now I'm
>>>>>>>>> getting a 403 on the fetch:
>>>>>>>>>
>>>>>>>>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>>>>> 403 Forbidden at
>>>>>>>>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>>>>>>>>> hudson.remoting.Launcher.run(Launcher.java:215)
>>>>>>>>>
>>>>>>>>> and corresponding log on jenkins master:
>>>>>>>>>
>>>>>>>>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While
>>>>>>>>> serving
>>>>>>>>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>>>>> hudson.security.AccessDeniedException2: anonymous is missing the
>>>>>>>>> Slave/Connect permission
>>>>>>>>>
>>>>>>>>> Going to look into what this means.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> I looked at the code and it looks there are few places the
>>>>>>>>>> executor might fail before it fetches the URI. Most of them have to do with
>>>>>>>>>> incorrect permissions. The code was written to have any errors reported
>>>>>>>>>> either in slave log or console or executor logs (there might be a bug here
>>>>>>>>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>>>>>>>>> empty in your case which suggests the executor died before it could even
>>>>>>>>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>>>>>>>>
>>>>>>>>>> Couple of questions:
>>>>>>>>>>
>>>>>>>>>> What user is Jenkins master running as? Is that user known to the
>>>>>>>>>> host on which mesos slave is running?
>>>>>>>>>>
>>>>>>>>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <
>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> The gist was compiled from that log. Here is the complete log
>>>>>>>>>>> from toggling the jenkins plugin on / off (you see the ping statements
>>>>>>>>>>> inbetween):
>>>>>>>>>>>
>>>>>>>>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>>>>
>>>>>>>>>>>> What does mesos-slave.err say?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Vinod,
>>>>>>>>>>>>>
>>>>>>>>>>>>> It's 0.14.0-rc4 in both.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I believe we have logging working:
>>>>>>>>>>>>>
>>>>>>>>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>>>>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO
>>>>>>>>>>>>> -> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49
>>>>>>>>>>>>> mesos-slave.WARNING ->
>>>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>>>>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there something else to check? Is it possible the executor
>>>>>>>>>>>>> is failing before it even attempts to fetch URIs?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to
>>>>>>>>>>>>> wget the slave.jar, and even run it. The mesos-jenkins slaves are dead now,
>>>>>>>>>>>>> so I can't connect to their slave-agent - but the jar does run. Not sure if
>>>>>>>>>>>>> the window for trying to connect to one of the mesos launched slaves is
>>>>>>>>>>>>> long enough to try before it is terminated due to failures. Interestingly,
>>>>>>>>>>>>> when I try to connect to one of the existing slaves I get a 403.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <
>>>>>>>>>>>>> vinodkone@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hey Whitney,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What version of mesos are you using (both in the cluster and
>>>>>>>>>>>>>> the plugin)?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The slave should print stuff to console when it is launching
>>>>>>>>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>>>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks Ray.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have very similar issue (empty executor directories) - but
>>>>>>>>>>>>>>> don't have any issues curling the slave.jar URI - and I don't have any
>>>>>>>>>>>>>>> existing JNLP process running. I don't have a jenkins user - is that the
>>>>>>>>>>>>>>> only setup you did on the slave?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <
>>>>>>>>>>>>>>> rayrod2030@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Whitney I would have a look at this github issue where I
>>>>>>>>>>>>>>>> work through some of my jenkins mesos-plugin issues with Vinod.  Might be
>>>>>>>>>>>>>>>> some of the same issues you are seeing.
>>>>>>>>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I
>>>>>>>>>>>>>>>>> was able to get it installed on our Jenkins master.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> However, it's unclear if there are any required steps for
>>>>>>>>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails
>>>>>>>>>>>>>>>>> instantly and there are no logs in the runs folder.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any help on how to debug? At first, I thought maybe we
>>>>>>>>>>>>>>>>> needed slave.jar or something but it looks like it's trying to fetch that
>>>>>>>>>>>>>>>>> from the master using the URIs. To clarify, I have done no special jenkins
>>>>>>>>>>>>>>>>> related setup (as per readme.md) on any of the slaves.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Vinod Kone <vi...@gmail.com>.
In your earlier email you were able to launch jenkins slaves on the mesos
cluster. What changed?

Did the problem start happening when you tried to run two different
instances of Jenkins masters each connect to the mesos master as a
different framework?


On Fri, Nov 8, 2013 at 1:45 PM, Whitney Sorenson <ws...@hubspot.com>wrote:

> I added the following environment to Jenkins so I could capture the
> framework logs:
>
>  GLOG_log_dir/var/log/jenkins GLOG_logtostderr0 GLOG_v 3
> However, I'm not sure I've seen anything of value, shown here:
>
> https://gist.github.com/wsorenson/30bd131a70aa602105d1
>
> I've completely stopped/started the mesos cluster, and restarted jenkins
> multiple times.
>
> Not sure what to try next - I've seen the exact same symptoms with a
> different framework on a different cluster. In that situation, fully
> stopping and bringing up the mesos cluster again seemed to resolve the
> issue.
>
>
>
>
>
> On Thu, Nov 7, 2013 at 8:23 PM, Benjamin Mahler <benjamin.mahler@gmail.com
> > wrote:
>
>> From the master's perspective, the framework disconnected immediately
>> after registering.
>>
>> You can bump up the logging on the jenkins scheduler by ensuring that
>> GLOG_v=3 is in your environment when our plugin is initialized.
>>
>> On Thu, Nov 7, 2013 at 3:17 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Sure (https://github.com/jenkinsci/mesos-plugin/issues/4) but I'm
>>> actually running into another issue which I've seen before with other
>>> frameworks:
>>>
>>> I added the plugin to a separate Jenkins cluster and the framework
>>> doesn't seem to be able to maintain the connection successfully.
>>>
>>> The jenkins master log shows:
>>>
>>> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.MesosCloud <init>
>>> INFO: Mesos master changed, restarting the scheduler
>>> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.JenkinsScheduler
>>> <init>
>>> INFO: JenkinsScheduler instantiated with jenkins
>>> http://jenkins-master/jenkins/ and mesos mesos-master:5050
>>>
>>> With nothing else (no confirmation that the framework registered.)
>>>
>>> In the mesos UI, I see that the framework is constantly failing /
>>> registering. The logs show:
>>>
>>> I1107 22:53:06.791082 4283 master.cpp:1365] Framework failover timeout,
>>> removing framework 201310222354-1872141066-5050-4282-2992 I1107
>>> 22:53:06.791760 4283 master.cpp:2022] Removing framework
>>> 201310222354-1872141066-5050-4282-2992 I1107 22:53:06.792107 4283
>>> hierarchical_allocator_process.hpp:352] Removed framework
>>> 201310222354-1872141066-5050-4282-2992 I1107 22:53:07.788573 4286
>>> master.cpp:695] Registering framework
>>> 201310222354-1872141066-5050-4282-2993 at scheduler(1)@
>>> 10.46.101.33:58478 I1107 22:53:07.788938 4286
>>> hierarchical_allocator_process.hpp:321] Added framework
>>> 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790592 4286
>>> master.cpp:1448] Sending 1 offers to framework
>>> 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790864 4284
>>> master.cpp:489] Framework 201310222354-1872141066-5050-4282-2993
>>> disconnected I1107 22:53:07.791007 4284 master.cpp:516] Giving framework
>>> 201310222354-1872141066-5050-4282-2993 0ns to failover I1107
>>> 22:53:07.791052 4285 hierarchical_allocator_process.hpp:397] Deactivated
>>> framework 201310222354-1872141066-5050-4282-2993
>>>
>>> This loop continues forever, happening several times per second.
>>>
>>> Any guidance on how to troubleshoot (I've already checked into network)
>>> or way to increase logging threshold on master?
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 5:10 PM, Benjamin Mahler <
>>> benjamin.mahler@gmail.com> wrote:
>>>
>>>> We should fix that so that it reconnects with Mesos after a restart of
>>>> Jenkins!
>>>>
>>>> Can you file an issue for this?
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 12:31 PM, Whitney Sorenson <
>>>> wsorenson@hubspot.com> wrote:
>>>>
>>>>> I should also point out the scheduler didn't seem to survive a reboot
>>>>> of Jenkins - I had to delete the mesos cloud and reenter the parameters.
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <
>>>>> wsorenson@hubspot.com> wrote:
>>>>>
>>>>>> Looks like we're using authentication on our slaves. So you either
>>>>>> need to pass
>>>>>>
>>>>>> -jnlpCredentials user:pass
>>>>>>
>>>>>> on the command line, or change around the permissions in Jenkins to
>>>>>> allow anonymous users to connect/run jobs.
>>>>>>
>>>>>> I'm not sure if it would make sense or not to add the user/pass in
>>>>>> the Jenkins plugin configuration screen or if it should be fetched another
>>>>>> way.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>
>>>>>>> Great. Let us know once you figure it out. Maybe I can add a FAQ to
>>>>>>> the plugin's README to help others (or you can contribute too :)).
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <
>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>
>>>>>>>> I added the jenkins user on the slave - this was the missing piece.
>>>>>>>> I'll add this to my PR for the readme. Got much further now; now I'm
>>>>>>>> getting a 403 on the fetch:
>>>>>>>>
>>>>>>>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>>>> 403 Forbidden at
>>>>>>>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>>>>>>>> hudson.remoting.Launcher.run(Launcher.java:215)
>>>>>>>>
>>>>>>>> and corresponding log on jenkins master:
>>>>>>>>
>>>>>>>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While
>>>>>>>> serving
>>>>>>>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>>>> hudson.security.AccessDeniedException2: anonymous is missing the
>>>>>>>> Slave/Connect permission
>>>>>>>>
>>>>>>>> Going to look into what this means.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> I looked at the code and it looks there are few places the
>>>>>>>>> executor might fail before it fetches the URI. Most of them have to do with
>>>>>>>>> incorrect permissions. The code was written to have any errors reported
>>>>>>>>> either in slave log or console or executor logs (there might be a bug here
>>>>>>>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>>>>>>>> empty in your case which suggests the executor died before it could even
>>>>>>>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>>>>>>>
>>>>>>>>> Couple of questions:
>>>>>>>>>
>>>>>>>>> What user is Jenkins master running as? Is that user known to the
>>>>>>>>> host on which mesos slave is running?
>>>>>>>>>
>>>>>>>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <
>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>
>>>>>>>>>> The gist was compiled from that log. Here is the complete log
>>>>>>>>>> from toggling the jenkins plugin on / off (you see the ping statements
>>>>>>>>>> inbetween):
>>>>>>>>>>
>>>>>>>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> What does mesos-slave.err say?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Vinod,
>>>>>>>>>>>>
>>>>>>>>>>>> It's 0.14.0-rc4 in both.
>>>>>>>>>>>>
>>>>>>>>>>>> I believe we have logging working:
>>>>>>>>>>>>
>>>>>>>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>>>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO
>>>>>>>>>>>> -> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49
>>>>>>>>>>>> mesos-slave.WARNING ->
>>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>>>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>>>>>>>
>>>>>>>>>>>> Is there something else to check? Is it possible the executor
>>>>>>>>>>>> is failing before it even attempts to fetch URIs?
>>>>>>>>>>>>
>>>>>>>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to
>>>>>>>>>>>> wget the slave.jar, and even run it. The mesos-jenkins slaves are dead now,
>>>>>>>>>>>> so I can't connect to their slave-agent - but the jar does run. Not sure if
>>>>>>>>>>>> the window for trying to connect to one of the mesos launched slaves is
>>>>>>>>>>>> long enough to try before it is terminated due to failures. Interestingly,
>>>>>>>>>>>> when I try to connect to one of the existing slaves I get a 403.
>>>>>>>>>>>>
>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vinodkone@gmail.com
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Whitney,
>>>>>>>>>>>>>
>>>>>>>>>>>>> What version of mesos are you using (both in the cluster and
>>>>>>>>>>>>> the plugin)?
>>>>>>>>>>>>>
>>>>>>>>>>>>> The slave should print stuff to console when it is launching
>>>>>>>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Ray.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have very similar issue (empty executor directories) - but
>>>>>>>>>>>>>> don't have any issues curling the slave.jar URI - and I don't have any
>>>>>>>>>>>>>> existing JNLP process running. I don't have a jenkins user - is that the
>>>>>>>>>>>>>> only setup you did on the slave?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <
>>>>>>>>>>>>>> rayrod2030@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Whitney I would have a look at this github issue where I
>>>>>>>>>>>>>>> work through some of my jenkins mesos-plugin issues with Vinod.  Might be
>>>>>>>>>>>>>>> some of the same issues you are seeing.
>>>>>>>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I
>>>>>>>>>>>>>>>> was able to get it installed on our Jenkins master.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> However, it's unclear if there are any required steps for
>>>>>>>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails
>>>>>>>>>>>>>>>> instantly and there are no logs in the runs folder.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any help on how to debug? At first, I thought maybe we
>>>>>>>>>>>>>>>> needed slave.jar or something but it looks like it's trying to fetch that
>>>>>>>>>>>>>>>> from the master using the URIs. To clarify, I have done no special jenkins
>>>>>>>>>>>>>>>> related setup (as per readme.md) on any of the slaves.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
I added the following environment to Jenkins so I could capture the
framework logs:

 GLOG_log_dir/var/log/jenkins GLOG_logtostderr0 GLOG_v 3
However, I'm not sure I've seen anything of value, shown here:

https://gist.github.com/wsorenson/30bd131a70aa602105d1

I've completely stopped/started the mesos cluster, and restarted jenkins
multiple times.

Not sure what to try next - I've seen the exact same symptoms with a
different framework on a different cluster. In that situation, fully
stopping and bringing up the mesos cluster again seemed to resolve the
issue.





On Thu, Nov 7, 2013 at 8:23 PM, Benjamin Mahler
<be...@gmail.com>wrote:

> From the master's perspective, the framework disconnected immediately
> after registering.
>
> You can bump up the logging on the jenkins scheduler by ensuring that
> GLOG_v=3 is in your environment when our plugin is initialized.
>
> On Thu, Nov 7, 2013 at 3:17 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> Sure (https://github.com/jenkinsci/mesos-plugin/issues/4) but I'm
>> actually running into another issue which I've seen before with other
>> frameworks:
>>
>> I added the plugin to a separate Jenkins cluster and the framework
>> doesn't seem to be able to maintain the connection successfully.
>>
>> The jenkins master log shows:
>>
>> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.MesosCloud <init>
>> INFO: Mesos master changed, restarting the scheduler
>> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.JenkinsScheduler
>> <init>
>> INFO: JenkinsScheduler instantiated with jenkins
>> http://jenkins-master/jenkins/ and mesos mesos-master:5050
>>
>> With nothing else (no confirmation that the framework registered.)
>>
>> In the mesos UI, I see that the framework is constantly failing /
>> registering. The logs show:
>>
>> I1107 22:53:06.791082 4283 master.cpp:1365] Framework failover timeout,
>> removing framework 201310222354-1872141066-5050-4282-2992 I1107
>> 22:53:06.791760 4283 master.cpp:2022] Removing framework
>> 201310222354-1872141066-5050-4282-2992 I1107 22:53:06.792107 4283
>> hierarchical_allocator_process.hpp:352] Removed framework
>> 201310222354-1872141066-5050-4282-2992 I1107 22:53:07.788573 4286
>> master.cpp:695] Registering framework
>> 201310222354-1872141066-5050-4282-2993 at scheduler(1)@10.46.101.33:58478I1107 22:53:07.788938 4286 hierarchical_allocator_process.hpp:321] Added
>> framework 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790592 4286
>> master.cpp:1448] Sending 1 offers to framework
>> 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790864 4284
>> master.cpp:489] Framework 201310222354-1872141066-5050-4282-2993
>> disconnected I1107 22:53:07.791007 4284 master.cpp:516] Giving framework
>> 201310222354-1872141066-5050-4282-2993 0ns to failover I1107
>> 22:53:07.791052 4285 hierarchical_allocator_process.hpp:397] Deactivated
>> framework 201310222354-1872141066-5050-4282-2993
>>
>> This loop continues forever, happening several times per second.
>>
>> Any guidance on how to troubleshoot (I've already checked into network)
>> or way to increase logging threshold on master?
>>
>>
>>
>> On Thu, Nov 7, 2013 at 5:10 PM, Benjamin Mahler <
>> benjamin.mahler@gmail.com> wrote:
>>
>>> We should fix that so that it reconnects with Mesos after a restart of
>>> Jenkins!
>>>
>>> Can you file an issue for this?
>>>
>>>
>>> On Thu, Nov 7, 2013 at 12:31 PM, Whitney Sorenson <wsorenson@hubspot.com
>>> > wrote:
>>>
>>>> I should also point out the scheduler didn't seem to survive a reboot
>>>> of Jenkins - I had to delete the mesos cloud and reenter the parameters.
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <wsorenson@hubspot.com
>>>> > wrote:
>>>>
>>>>> Looks like we're using authentication on our slaves. So you either
>>>>> need to pass
>>>>>
>>>>> -jnlpCredentials user:pass
>>>>>
>>>>> on the command line, or change around the permissions in Jenkins to
>>>>> allow anonymous users to connect/run jobs.
>>>>>
>>>>> I'm not sure if it would make sense or not to add the user/pass in the
>>>>> Jenkins plugin configuration screen or if it should be fetched another way.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>
>>>>>> Great. Let us know once you figure it out. Maybe I can add a FAQ to
>>>>>> the plugin's README to help others (or you can contribute too :)).
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <
>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>
>>>>>>> I added the jenkins user on the slave - this was the missing piece.
>>>>>>> I'll add this to my PR for the readme. Got much further now; now I'm
>>>>>>> getting a 403 on the fetch:
>>>>>>>
>>>>>>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>>> 403 Forbidden at
>>>>>>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>>>>>>> hudson.remoting.Launcher.run(Launcher.java:215)
>>>>>>>
>>>>>>> and corresponding log on jenkins master:
>>>>>>>
>>>>>>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While
>>>>>>> serving
>>>>>>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>>> hudson.security.AccessDeniedException2: anonymous is missing the
>>>>>>> Slave/Connect permission
>>>>>>>
>>>>>>> Going to look into what this means.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>
>>>>>>>> I looked at the code and it looks there are few places the executor
>>>>>>>> might fail before it fetches the URI. Most of them have to do with
>>>>>>>> incorrect permissions. The code was written to have any errors reported
>>>>>>>> either in slave log or console or executor logs (there might be a bug here
>>>>>>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>>>>>>> empty in your case which suggests the executor died before it could even
>>>>>>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>>>>>>
>>>>>>>> Couple of questions:
>>>>>>>>
>>>>>>>> What user is Jenkins master running as? Is that user known to the
>>>>>>>> host on which mesos slave is running?
>>>>>>>>
>>>>>>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <
>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>
>>>>>>>>> The gist was compiled from that log. Here is the complete log from
>>>>>>>>> toggling the jenkins plugin on / off (you see the ping statements
>>>>>>>>> inbetween):
>>>>>>>>>
>>>>>>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> What does mesos-slave.err say?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Vinod,
>>>>>>>>>>>
>>>>>>>>>>> It's 0.14.0-rc4 in both.
>>>>>>>>>>>
>>>>>>>>>>> I believe we have logging working:
>>>>>>>>>>>
>>>>>>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO
>>>>>>>>>>> -> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49
>>>>>>>>>>> mesos-slave.WARNING ->
>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>>>>>>
>>>>>>>>>>> Is there something else to check? Is it possible the executor is
>>>>>>>>>>> failing before it even attempts to fetch URIs?
>>>>>>>>>>>
>>>>>>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget
>>>>>>>>>>> the slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>>>>>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>>>>>>>>> window for trying to connect to one of the mesos launched slaves is long
>>>>>>>>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>>>>>>>>> I try to connect to one of the existing slaves I get a 403.
>>>>>>>>>>>
>>>>>>>>>>> -Whitney
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey Whitney,
>>>>>>>>>>>>
>>>>>>>>>>>> What version of mesos are you using (both in the cluster and
>>>>>>>>>>>> the plugin)?
>>>>>>>>>>>>
>>>>>>>>>>>> The slave should print stuff to console when it is launching
>>>>>>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Ray.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have very similar issue (empty executor directories) - but
>>>>>>>>>>>>> don't have any issues curling the slave.jar URI - and I don't have any
>>>>>>>>>>>>> existing JNLP process running. I don't have a jenkins user - is that the
>>>>>>>>>>>>> only setup you did on the slave?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <
>>>>>>>>>>>>> rayrod2030@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Whitney I would have a look at this github issue where I
>>>>>>>>>>>>>> work through some of my jenkins mesos-plugin issues with Vinod.  Might be
>>>>>>>>>>>>>> some of the same issues you are seeing.
>>>>>>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I
>>>>>>>>>>>>>>> was able to get it installed on our Jenkins master.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, it's unclear if there are any required steps for
>>>>>>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails
>>>>>>>>>>>>>>> instantly and there are no logs in the runs folder.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any help on how to debug? At first, I thought maybe we
>>>>>>>>>>>>>>> needed slave.jar or something but it looks like it's trying to fetch that
>>>>>>>>>>>>>>> from the master using the URIs. To clarify, I have done no special jenkins
>>>>>>>>>>>>>>> related setup (as per readme.md) on any of the slaves.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Benjamin Mahler <be...@gmail.com>.
>From the master's perspective, the framework disconnected immediately after
registering.

You can bump up the logging on the jenkins scheduler by ensuring that
GLOG_v=3 is in your environment when our plugin is initialized.

On Thu, Nov 7, 2013 at 3:17 PM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Sure (https://github.com/jenkinsci/mesos-plugin/issues/4) but I'm
> actually running into another issue which I've seen before with other
> frameworks:
>
> I added the plugin to a separate Jenkins cluster and the framework doesn't
> seem to be able to maintain the connection successfully.
>
> The jenkins master log shows:
>
> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.MesosCloud <init>
> INFO: Mesos master changed, restarting the scheduler
> Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.JenkinsScheduler <init>
> INFO: JenkinsScheduler instantiated with jenkins
> http://jenkins-master/jenkins/ and mesos mesos-master:5050
>
> With nothing else (no confirmation that the framework registered.)
>
> In the mesos UI, I see that the framework is constantly failing /
> registering. The logs show:
>
> I1107 22:53:06.791082 4283 master.cpp:1365] Framework failover timeout,
> removing framework 201310222354-1872141066-5050-4282-2992 I1107
> 22:53:06.791760 4283 master.cpp:2022] Removing framework
> 201310222354-1872141066-5050-4282-2992 I1107 22:53:06.792107 4283
> hierarchical_allocator_process.hpp:352] Removed framework
> 201310222354-1872141066-5050-4282-2992 I1107 22:53:07.788573 4286
> master.cpp:695] Registering framework
> 201310222354-1872141066-5050-4282-2993 at scheduler(1)@10.46.101.33:58478I1107 22:53:07.788938 4286 hierarchical_allocator_process.hpp:321] Added
> framework 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790592 4286
> master.cpp:1448] Sending 1 offers to framework
> 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790864 4284
> master.cpp:489] Framework 201310222354-1872141066-5050-4282-2993
> disconnected I1107 22:53:07.791007 4284 master.cpp:516] Giving framework
> 201310222354-1872141066-5050-4282-2993 0ns to failover I1107
> 22:53:07.791052 4285 hierarchical_allocator_process.hpp:397] Deactivated
> framework 201310222354-1872141066-5050-4282-2993
>
> This loop continues forever, happening several times per second.
>
> Any guidance on how to troubleshoot (I've already checked into network) or
> way to increase logging threshold on master?
>
>
>
> On Thu, Nov 7, 2013 at 5:10 PM, Benjamin Mahler <benjamin.mahler@gmail.com
> > wrote:
>
>> We should fix that so that it reconnects with Mesos after a restart of
>> Jenkins!
>>
>> Can you file an issue for this?
>>
>>
>> On Thu, Nov 7, 2013 at 12:31 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> I should also point out the scheduler didn't seem to survive a reboot of
>>> Jenkins - I had to delete the mesos cloud and reenter the parameters.
>>>
>>>
>>> On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>>
>>>> Looks like we're using authentication on our slaves. So you either need
>>>> to pass
>>>>
>>>> -jnlpCredentials user:pass
>>>>
>>>> on the command line, or change around the permissions in Jenkins to
>>>> allow anonymous users to connect/run jobs.
>>>>
>>>> I'm not sure if it would make sense or not to add the user/pass in the
>>>> Jenkins plugin configuration screen or if it should be fetched another way.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>>
>>>>> Great. Let us know once you figure it out. Maybe I can add a FAQ to
>>>>> the plugin's README to help others (or you can contribute too :)).
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <
>>>>> wsorenson@hubspot.com> wrote:
>>>>>
>>>>>> I added the jenkins user on the slave - this was the missing piece.
>>>>>> I'll add this to my PR for the readme. Got much further now; now I'm
>>>>>> getting a 403 on the fetch:
>>>>>>
>>>>>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>> 403 Forbidden at
>>>>>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>>>>>> hudson.remoting.Launcher.run(Launcher.java:215)
>>>>>>
>>>>>> and corresponding log on jenkins master:
>>>>>>
>>>>>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While
>>>>>> serving
>>>>>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>>> hudson.security.AccessDeniedException2: anonymous is missing the
>>>>>> Slave/Connect permission
>>>>>>
>>>>>> Going to look into what this means.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>
>>>>>>> I looked at the code and it looks there are few places the executor
>>>>>>> might fail before it fetches the URI. Most of them have to do with
>>>>>>> incorrect permissions. The code was written to have any errors reported
>>>>>>> either in slave log or console or executor logs (there might be a bug here
>>>>>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>>>>>> empty in your case which suggests the executor died before it could even
>>>>>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>>>>>
>>>>>>> Couple of questions:
>>>>>>>
>>>>>>> What user is Jenkins master running as? Is that user known to the
>>>>>>> host on which mesos slave is running?
>>>>>>>
>>>>>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <
>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>
>>>>>>>> The gist was compiled from that log. Here is the complete log from
>>>>>>>> toggling the jenkins plugin on / off (you see the ping statements
>>>>>>>> inbetween):
>>>>>>>>
>>>>>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> What does mesos-slave.err say?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Vinod,
>>>>>>>>>>
>>>>>>>>>> It's 0.14.0-rc4 in both.
>>>>>>>>>>
>>>>>>>>>> I believe we have logging working:
>>>>>>>>>>
>>>>>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING
>>>>>>>>>> -> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>>>>>
>>>>>>>>>> Is there something else to check? Is it possible the executor is
>>>>>>>>>> failing before it even attempts to fetch URIs?
>>>>>>>>>>
>>>>>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget
>>>>>>>>>> the slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>>>>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>>>>>>>> window for trying to connect to one of the mesos launched slaves is long
>>>>>>>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>>>>>>>> I try to connect to one of the existing slaves I get a 403.
>>>>>>>>>>
>>>>>>>>>> -Whitney
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hey Whitney,
>>>>>>>>>>>
>>>>>>>>>>> What version of mesos are you using (both in the cluster and the
>>>>>>>>>>> plugin)?
>>>>>>>>>>>
>>>>>>>>>>> The slave should print stuff to console when it is launching
>>>>>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Ray.
>>>>>>>>>>>>
>>>>>>>>>>>> I have very similar issue (empty executor directories) - but
>>>>>>>>>>>> don't have any issues curling the slave.jar URI - and I don't have any
>>>>>>>>>>>> existing JNLP process running. I don't have a jenkins user - is that the
>>>>>>>>>>>> only setup you did on the slave?
>>>>>>>>>>>>
>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <
>>>>>>>>>>>> rayrod2030@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Whitney I would have a look at this github issue where I
>>>>>>>>>>>>> work through some of my jenkins mesos-plugin issues with Vinod.  Might be
>>>>>>>>>>>>> some of the same issues you are seeing.
>>>>>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I
>>>>>>>>>>>>>> was able to get it installed on our Jenkins master.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, it's unclear if there are any required steps for
>>>>>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails
>>>>>>>>>>>>>> instantly and there are no logs in the runs folder.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>>>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
Sure (https://github.com/jenkinsci/mesos-plugin/issues/4) but I'm actually
running into another issue which I've seen before with other frameworks:

I added the plugin to a separate Jenkins cluster and the framework doesn't
seem to be able to maintain the connection successfully.

The jenkins master log shows:

Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.MesosCloud <init>
INFO: Mesos master changed, restarting the scheduler
Nov 7, 2013 10:12:38 PM org.jenkinsci.plugins.mesos.JenkinsScheduler <init>
INFO: JenkinsScheduler instantiated with jenkins
http://jenkins-master/jenkins/ and mesos mesos-master:5050

With nothing else (no confirmation that the framework registered.)

In the mesos UI, I see that the framework is constantly failing /
registering. The logs show:

I1107 22:53:06.791082 4283 master.cpp:1365] Framework failover timeout,
removing framework 201310222354-1872141066-5050-4282-2992 I1107
22:53:06.791760 4283 master.cpp:2022] Removing framework
201310222354-1872141066-5050-4282-2992 I1107 22:53:06.792107 4283
hierarchical_allocator_process.hpp:352] Removed framework
201310222354-1872141066-5050-4282-2992 I1107 22:53:07.788573 4286
master.cpp:695] Registering framework
201310222354-1872141066-5050-4282-2993 at
scheduler(1)@10.46.101.33:58478I1107 22:53:07.788938 4286
hierarchical_allocator_process.hpp:321] Added
framework 201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790592 4286
master.cpp:1448] Sending 1 offers to framework
201310222354-1872141066-5050-4282-2993 I1107 22:53:07.790864 4284
master.cpp:489] Framework 201310222354-1872141066-5050-4282-2993
disconnected I1107 22:53:07.791007 4284 master.cpp:516] Giving framework
201310222354-1872141066-5050-4282-2993 0ns to failover I1107
22:53:07.791052 4285 hierarchical_allocator_process.hpp:397] Deactivated
framework 201310222354-1872141066-5050-4282-2993

This loop continues forever, happening several times per second.

Any guidance on how to troubleshoot (I've already checked into network) or
way to increase logging threshold on master?



On Thu, Nov 7, 2013 at 5:10 PM, Benjamin Mahler
<be...@gmail.com>wrote:

> We should fix that so that it reconnects with Mesos after a restart of
> Jenkins!
>
> Can you file an issue for this?
>
>
> On Thu, Nov 7, 2013 at 12:31 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> I should also point out the scheduler didn't seem to survive a reboot of
>> Jenkins - I had to delete the mesos cloud and reenter the parameters.
>>
>>
>> On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Looks like we're using authentication on our slaves. So you either need
>>> to pass
>>>
>>> -jnlpCredentials user:pass
>>>
>>> on the command line, or change around the permissions in Jenkins to
>>> allow anonymous users to connect/run jobs.
>>>
>>> I'm not sure if it would make sense or not to add the user/pass in the
>>> Jenkins plugin configuration screen or if it should be fetched another way.
>>>
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>
>>>> Great. Let us know once you figure it out. Maybe I can add a FAQ to the
>>>> plugin's README to help others (or you can contribute too :)).
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <
>>>> wsorenson@hubspot.com> wrote:
>>>>
>>>>> I added the jenkins user on the slave - this was the missing piece.
>>>>> I'll add this to my PR for the readme. Got much further now; now I'm
>>>>> getting a 403 on the fetch:
>>>>>
>>>>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>> 403 Forbidden at
>>>>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>>>>> hudson.remoting.Launcher.run(Launcher.java:215)
>>>>>
>>>>> and corresponding log on jenkins master:
>>>>>
>>>>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving
>>>>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>>> hudson.security.AccessDeniedException2: anonymous is missing the
>>>>> Slave/Connect permission
>>>>>
>>>>> Going to look into what this means.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>
>>>>>> I looked at the code and it looks there are few places the executor
>>>>>> might fail before it fetches the URI. Most of them have to do with
>>>>>> incorrect permissions. The code was written to have any errors reported
>>>>>> either in slave log or console or executor logs (there might be a bug here
>>>>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>>>>> empty in your case which suggests the executor died before it could even
>>>>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>>>>
>>>>>> Couple of questions:
>>>>>>
>>>>>> What user is Jenkins master running as? Is that user known to the
>>>>>> host on which mesos slave is running?
>>>>>>
>>>>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <
>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>
>>>>>>> The gist was compiled from that log. Here is the complete log from
>>>>>>> toggling the jenkins plugin on / off (you see the ping statements
>>>>>>> inbetween):
>>>>>>>
>>>>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>
>>>>>>>> What does mesos-slave.err say?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Vinod,
>>>>>>>>>
>>>>>>>>> It's 0.14.0-rc4 in both.
>>>>>>>>>
>>>>>>>>> I believe we have logging working:
>>>>>>>>>
>>>>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING
>>>>>>>>> -> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>>>>
>>>>>>>>> Is there something else to check? Is it possible the executor is
>>>>>>>>> failing before it even attempts to fetch URIs?
>>>>>>>>>
>>>>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget
>>>>>>>>> the slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>>>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>>>>>>> window for trying to connect to one of the mesos launched slaves is long
>>>>>>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>>>>>>> I try to connect to one of the existing slaves I get a 403.
>>>>>>>>>
>>>>>>>>> -Whitney
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>>
>>>>>>>>>> Hey Whitney,
>>>>>>>>>>
>>>>>>>>>> What version of mesos are you using (both in the cluster and the
>>>>>>>>>> plugin)?
>>>>>>>>>>
>>>>>>>>>> The slave should print stuff to console when it is launching
>>>>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Ray.
>>>>>>>>>>>
>>>>>>>>>>> I have very similar issue (empty executor directories) - but
>>>>>>>>>>> don't have any issues curling the slave.jar URI - and I don't have any
>>>>>>>>>>> existing JNLP process running. I don't have a jenkins user - is that the
>>>>>>>>>>> only setup you did on the slave?
>>>>>>>>>>>
>>>>>>>>>>> -Whitney
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <
>>>>>>>>>>> rayrod2030@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Whitney I would have a look at this github issue where I
>>>>>>>>>>>> work through some of my jenkins mesos-plugin issues with Vinod.  Might be
>>>>>>>>>>>> some of the same issues you are seeing.
>>>>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>>>>
>>>>>>>>>>>> Ray
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was
>>>>>>>>>>>>> able to get it installed on our Jenkins master.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, it's unclear if there are any required steps for
>>>>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails
>>>>>>>>>>>>> instantly and there are no logs in the runs folder.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Benjamin Mahler <be...@gmail.com>.
We should fix that so that it reconnects with Mesos after a restart of
Jenkins!

Can you file an issue for this?


On Thu, Nov 7, 2013 at 12:31 PM, Whitney Sorenson <ws...@hubspot.com>wrote:

> I should also point out the scheduler didn't seem to survive a reboot of
> Jenkins - I had to delete the mesos cloud and reenter the parameters.
>
>
> On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> Looks like we're using authentication on our slaves. So you either need
>> to pass
>>
>> -jnlpCredentials user:pass
>>
>> on the command line, or change around the permissions in Jenkins to allow
>> anonymous users to connect/run jobs.
>>
>> I'm not sure if it would make sense or not to add the user/pass in the
>> Jenkins plugin configuration screen or if it should be fetched another way.
>>
>>
>>
>>
>> On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com> wrote:
>>
>>> Great. Let us know once you figure it out. Maybe I can add a FAQ to the
>>> plugin's README to help others (or you can contribute too :)).
>>>
>>>
>>> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <wsorenson@hubspot.com
>>> > wrote:
>>>
>>>> I added the jenkins user on the slave - this was the missing piece.
>>>> I'll add this to my PR for the readme. Got much further now; now I'm
>>>> getting a 403 on the fetch:
>>>>
>>>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>> 403 Forbidden at
>>>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>>>> hudson.remoting.Launcher.run(Launcher.java:215)
>>>>
>>>> and corresponding log on jenkins master:
>>>>
>>>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving
>>>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>>> hudson.security.AccessDeniedException2: anonymous is missing the
>>>> Slave/Connect permission
>>>>
>>>> Going to look into what this means.
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>>
>>>>> I looked at the code and it looks there are few places the executor
>>>>> might fail before it fetches the URI. Most of them have to do with
>>>>> incorrect permissions. The code was written to have any errors reported
>>>>> either in slave log or console or executor logs (there might be a bug here
>>>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>>>> empty in your case which suggests the executor died before it could even
>>>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>>>
>>>>> Couple of questions:
>>>>>
>>>>> What user is Jenkins master running as? Is that user known to the host
>>>>> on which mesos slave is running?
>>>>>
>>>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <
>>>>> wsorenson@hubspot.com> wrote:
>>>>>
>>>>>> The gist was compiled from that log. Here is the complete log from
>>>>>> toggling the jenkins plugin on / off (you see the ping statements
>>>>>> inbetween):
>>>>>>
>>>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>
>>>>>>> What does mesos-slave.err say?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>
>>>>>>>> Hi Vinod,
>>>>>>>>
>>>>>>>> It's 0.14.0-rc4 in both.
>>>>>>>>
>>>>>>>> I believe we have logging working:
>>>>>>>>
>>>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING
>>>>>>>> -> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>>>
>>>>>>>> Is there something else to check? Is it possible the executor is
>>>>>>>> failing before it even attempts to fetch URIs?
>>>>>>>>
>>>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget
>>>>>>>> the slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>>>>>> window for trying to connect to one of the mesos launched slaves is long
>>>>>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>>>>>> I try to connect to one of the existing slaves I get a 403.
>>>>>>>>
>>>>>>>> -Whitney
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>>
>>>>>>>>> Hey Whitney,
>>>>>>>>>
>>>>>>>>> What version of mesos are you using (both in the cluster and the
>>>>>>>>> plugin)?
>>>>>>>>>
>>>>>>>>> The slave should print stuff to console when it is launching
>>>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Ray.
>>>>>>>>>>
>>>>>>>>>> I have very similar issue (empty executor directories) - but
>>>>>>>>>> don't have any issues curling the slave.jar URI - and I don't have any
>>>>>>>>>> existing JNLP process running. I don't have a jenkins user - is that the
>>>>>>>>>> only setup you did on the slave?
>>>>>>>>>>
>>>>>>>>>> -Whitney
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <
>>>>>>>>>> rayrod2030@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Whitney I would have a look at this github issue where I work
>>>>>>>>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>>>>>>>>> of the same issues you are seeing.
>>>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>>>
>>>>>>>>>>> Ray
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all!
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was
>>>>>>>>>>>> able to get it installed on our Jenkins master.
>>>>>>>>>>>>
>>>>>>>>>>>> However, it's unclear if there are any required steps for
>>>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails
>>>>>>>>>>>> instantly and there are no logs in the runs folder.
>>>>>>>>>>>>
>>>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>>>
>>>>>>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>>>>>>
>>>>>>>>>>>> -Whitney
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
I should also point out the scheduler didn't seem to survive a reboot of
Jenkins - I had to delete the mesos cloud and reenter the parameters.


On Thu, Nov 7, 2013 at 3:26 PM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Looks like we're using authentication on our slaves. So you either need to
> pass
>
> -jnlpCredentials user:pass
>
> on the command line, or change around the permissions in Jenkins to allow
> anonymous users to connect/run jobs.
>
> I'm not sure if it would make sense or not to add the user/pass in the
> Jenkins plugin configuration screen or if it should be fetched another way.
>
>
>
>
> On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com> wrote:
>
>> Great. Let us know once you figure it out. Maybe I can add a FAQ to the
>> plugin's README to help others (or you can contribute too :)).
>>
>>
>> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> I added the jenkins user on the slave - this was the missing piece. I'll
>>> add this to my PR for the readme. Got much further now; now I'm getting a
>>> 403 on the fetch:
>>>
>>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>> 403 Forbidden at
>>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>>> hudson.remoting.Launcher.run(Launcher.java:215)
>>>
>>> and corresponding log on jenkins master:
>>>
>>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving
>>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>>> hudson.security.AccessDeniedException2: anonymous is missing the
>>> Slave/Connect permission
>>>
>>> Going to look into what this means.
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>
>>>> I looked at the code and it looks there are few places the executor
>>>> might fail before it fetches the URI. Most of them have to do with
>>>> incorrect permissions. The code was written to have any errors reported
>>>> either in slave log or console or executor logs (there might be a bug here
>>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>>> empty in your case which suggests the executor died before it could even
>>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>>
>>>> Couple of questions:
>>>>
>>>> What user is Jenkins master running as? Is that user known to the host
>>>> on which mesos slave is running?
>>>>
>>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <
>>>> wsorenson@hubspot.com> wrote:
>>>>
>>>>> The gist was compiled from that log. Here is the complete log from
>>>>> toggling the jenkins plugin on / off (you see the ping statements
>>>>> inbetween):
>>>>>
>>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>
>>>>>> What does mesos-slave.err say?
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>
>>>>>>> Hi Vinod,
>>>>>>>
>>>>>>> It's 0.14.0-rc4 in both.
>>>>>>>
>>>>>>> I believe we have logging working:
>>>>>>>
>>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>>
>>>>>>> Is there something else to check? Is it possible the executor is
>>>>>>> failing before it even attempts to fetch URIs?
>>>>>>>
>>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
>>>>>>> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>>>>> window for trying to connect to one of the mesos launched slaves is long
>>>>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>>>>> I try to connect to one of the existing slaves I get a 403.
>>>>>>>
>>>>>>> -Whitney
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hey Whitney,
>>>>>>>>
>>>>>>>> What version of mesos are you using (both in the cluster and the
>>>>>>>> plugin)?
>>>>>>>>
>>>>>>>> The slave should print stuff to console when it is launching
>>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>
>>>>>>>>> Thanks Ray.
>>>>>>>>>
>>>>>>>>> I have very similar issue (empty executor directories) - but don't
>>>>>>>>> have any issues curling the slave.jar URI - and I don't have any existing
>>>>>>>>> JNLP process running. I don't have a jenkins user - is that the only setup
>>>>>>>>> you did on the slave?
>>>>>>>>>
>>>>>>>>> -Whitney
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <
>>>>>>>>> rayrod2030@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Whitney I would have a look at this github issue where I work
>>>>>>>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>>>>>>>> of the same issues you are seeing.
>>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>>
>>>>>>>>>> Ray
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all!
>>>>>>>>>>>
>>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was
>>>>>>>>>>> able to get it installed on our Jenkins master.
>>>>>>>>>>>
>>>>>>>>>>> However, it's unclear if there are any required steps for
>>>>>>>>>>> setting up the slaves. When a framework task is launched, it fails
>>>>>>>>>>> instantly and there are no logs in the runs folder.
>>>>>>>>>>>
>>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>>
>>>>>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>>>>>
>>>>>>>>>>> -Whitney
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
Looks like we're using authentication on our slaves. So you either need to
pass

-jnlpCredentials user:pass

on the command line, or change around the permissions in Jenkins to allow
anonymous users to connect/run jobs.

I'm not sure if it would make sense or not to add the user/pass in the
Jenkins plugin configuration screen or if it should be fetched another way.




On Thu, Nov 7, 2013 at 2:52 PM, Vinod Kone <vi...@gmail.com> wrote:

> Great. Let us know once you figure it out. Maybe I can add a FAQ to the
> plugin's README to help others (or you can contribute too :)).
>
>
> On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> I added the jenkins user on the slave - this was the missing piece. I'll
>> add this to my PR for the readme. Got much further now; now I'm getting a
>> 403 on the fetch:
>>
>> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>> 403 Forbidden at
>> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
>> hudson.remoting.Launcher.run(Launcher.java:215)
>>
>> and corresponding log on jenkins master:
>>
>> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving
>> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
>> hudson.security.AccessDeniedException2: anonymous is missing the
>> Slave/Connect permission
>>
>> Going to look into what this means.
>>
>>
>>
>> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:
>>
>>> I looked at the code and it looks there are few places the executor
>>> might fail before it fetches the URI. Most of them have to do with
>>> incorrect permissions. The code was written to have any errors reported
>>> either in slave log or console or executor logs (there might be a bug here
>>> if we are in fact swallowing errors). IIUC, the executor log directory is
>>> empty in your case which suggests the executor died before it could even
>>> create "stdout" or "stderr" files in its sandbox (Is this true?).
>>>
>>> Couple of questions:
>>>
>>> What user is Jenkins master running as? Is that user known to the host
>>> on which mesos slave is running?
>>>
>>> How are you starting the mesos slave (e.g., cmd line flags)?
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <wsorenson@hubspot.com
>>> > wrote:
>>>
>>>> The gist was compiled from that log. Here is the complete log from
>>>> toggling the jenkins plugin on / off (you see the ping statements
>>>> inbetween):
>>>>
>>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>>
>>>>> What does mesos-slave.err say?
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>>> wsorenson@hubspot.com> wrote:
>>>>>
>>>>>> Hi Vinod,
>>>>>>
>>>>>> It's 0.14.0-rc4 in both.
>>>>>>
>>>>>> I believe we have logging working:
>>>>>>
>>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>>
>>>>>> Is there something else to check? Is it possible the executor is
>>>>>> failing before it even attempts to fetch URIs?
>>>>>>
>>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
>>>>>> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>>>> window for trying to connect to one of the mesos launched slaves is long
>>>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>>>> I try to connect to one of the existing slaves I get a 403.
>>>>>>
>>>>>> -Whitney
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>>
>>>>>>> Hey Whitney,
>>>>>>>
>>>>>>> What version of mesos are you using (both in the cluster and the
>>>>>>> plugin)?
>>>>>>>
>>>>>>> The slave should print stuff to console when it is launching
>>>>>>> executor (e.g., "Fetching resources..."). I don't see that in the gist you
>>>>>>> pasted. Are you capturing stdout/stderr of the slave?
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>
>>>>>>>> Thanks Ray.
>>>>>>>>
>>>>>>>> I have very similar issue (empty executor directories) - but don't
>>>>>>>> have any issues curling the slave.jar URI - and I don't have any existing
>>>>>>>> JNLP process running. I don't have a jenkins user - is that the only setup
>>>>>>>> you did on the slave?
>>>>>>>>
>>>>>>>> -Whitney
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <rayrod2030@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> Hi Whitney I would have a look at this github issue where I work
>>>>>>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>>>>>>> of the same issues you are seeing.
>>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>>
>>>>>>>>> Ray
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all!
>>>>>>>>>>
>>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was
>>>>>>>>>> able to get it installed on our Jenkins master.
>>>>>>>>>>
>>>>>>>>>> However, it's unclear if there are any required steps for setting
>>>>>>>>>> up the slaves. When a framework task is launched, it fails instantly and
>>>>>>>>>> there are no logs in the runs folder.
>>>>>>>>>>
>>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>>
>>>>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>>>>
>>>>>>>>>> -Whitney
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Vinod Kone <vi...@gmail.com>.
Great. Let us know once you figure it out. Maybe I can add a FAQ to the
plugin's README to help others (or you can contribute too :)).


On Thu, Nov 7, 2013 at 11:40 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> I added the jenkins user on the slave - this was the missing piece. I'll
> add this to my PR for the readme. Got much further now; now I'm getting a
> 403 on the fetch:
>
> /jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
> 403 Forbidden at
> hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
> hudson.remoting.Launcher.run(Launcher.java:215)
>
> and corresponding log on jenkins master:
>
> Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving
> http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
> hudson.security.AccessDeniedException2: anonymous is missing the
> Slave/Connect permission
>
> Going to look into what this means.
>
>
>
> On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:
>
>> I looked at the code and it looks there are few places the executor might
>> fail before it fetches the URI. Most of them have to do with incorrect
>> permissions. The code was written to have any errors reported either in
>> slave log or console or executor logs (there might be a bug here if we are
>> in fact swallowing errors). IIUC, the executor log directory is empty in
>> your case which suggests the executor died before it could even create
>> "stdout" or "stderr" files in its sandbox (Is this true?).
>>
>> Couple of questions:
>>
>> What user is Jenkins master running as? Is that user known to the host on
>> which mesos slave is running?
>>
>> How are you starting the mesos slave (e.g., cmd line flags)?
>>
>>
>>
>> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> The gist was compiled from that log. Here is the complete log from
>>> toggling the jenkins plugin on / off (you see the ping statements
>>> inbetween):
>>>
>>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>>
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>
>>>> What does mesos-slave.err say?
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <
>>>> wsorenson@hubspot.com> wrote:
>>>>
>>>>> Hi Vinod,
>>>>>
>>>>> It's 0.14.0-rc4 in both.
>>>>>
>>>>> I believe we have logging working:
>>>>>
>>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>>
>>>>> Is there something else to check? Is it possible the executor is
>>>>> failing before it even attempts to fetch URIs?
>>>>>
>>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
>>>>> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>>> window for trying to connect to one of the mesos launched slaves is long
>>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>>> I try to connect to one of the existing slaves I get a 403.
>>>>>
>>>>> -Whitney
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com>wrote:
>>>>>
>>>>>> Hey Whitney,
>>>>>>
>>>>>> What version of mesos are you using (both in the cluster and the
>>>>>> plugin)?
>>>>>>
>>>>>> The slave should print stuff to console when it is launching executor
>>>>>> (e.g., "Fetching resources..."). I don't see that in the gist you pasted.
>>>>>> Are you capturing stdout/stderr of the slave?
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>
>>>>>>> Thanks Ray.
>>>>>>>
>>>>>>> I have very similar issue (empty executor directories) - but don't
>>>>>>> have any issues curling the slave.jar URI - and I don't have any existing
>>>>>>> JNLP process running. I don't have a jenkins user - is that the only setup
>>>>>>> you did on the slave?
>>>>>>>
>>>>>>> -Whitney
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com>wrote:
>>>>>>>
>>>>>>>> Hi Whitney I would have a look at this github issue where I work
>>>>>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>>>>>> of the same issues you are seeing.
>>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>>
>>>>>>>> Ray
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>>
>>>>>>>>> Hi all!
>>>>>>>>>
>>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was
>>>>>>>>> able to get it installed on our Jenkins master.
>>>>>>>>>
>>>>>>>>> However, it's unclear if there are any required steps for setting
>>>>>>>>> up the slaves. When a framework task is launched, it fails instantly and
>>>>>>>>> there are no logs in the runs folder.
>>>>>>>>>
>>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>>
>>>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>>>
>>>>>>>>> -Whitney
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
I added the jenkins user on the slave - this was the missing piece. I'll
add this to my PR for the readme. Got much further now; now I'm getting a
403 on the fetch:

/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
403 Forbidden at
hudson.remoting.Launcher.parseJnlpArguments(Launcher.java:261) at
hudson.remoting.Launcher.run(Launcher.java:215)

and corresponding log on jenkins master:

Nov 7, 2013 2:38:39 PM winstone.Logger logInternal INFO: While serving
http://localhost:8080/jenkins/computer/mesos-jenkins-6f4719c8-1c61-4b28-b5ab-ba298e846840/slave-agent.jnlp:
hudson.security.AccessDeniedException2: anonymous is missing the
Slave/Connect permission

Going to look into what this means.



On Thu, Nov 7, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:

> I looked at the code and it looks there are few places the executor might
> fail before it fetches the URI. Most of them have to do with incorrect
> permissions. The code was written to have any errors reported either in
> slave log or console or executor logs (there might be a bug here if we are
> in fact swallowing errors). IIUC, the executor log directory is empty in
> your case which suggests the executor died before it could even create
> "stdout" or "stderr" files in its sandbox (Is this true?).
>
> Couple of questions:
>
> What user is Jenkins master running as? Is that user known to the host on
> which mesos slave is running?
>
> How are you starting the mesos slave (e.g., cmd line flags)?
>
>
>
> On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> The gist was compiled from that log. Here is the complete log from
>> toggling the jenkins plugin on / off (you see the ping statements
>> inbetween):
>>
>> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>>
>>
>>
>>
>> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com> wrote:
>>
>>> What does mesos-slave.err say?
>>>
>>>
>>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <wsorenson@hubspot.com
>>> > wrote:
>>>
>>>> Hi Vinod,
>>>>
>>>> It's 0.14.0-rc4 in both.
>>>>
>>>> I believe we have logging working:
>>>>
>>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>>
>>>> Is there something else to check? Is it possible the executor is
>>>> failing before it even attempts to fetch URIs?
>>>>
>>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
>>>> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>>> window for trying to connect to one of the mesos launched slaves is long
>>>> enough to try before it is terminated due to failures. Interestingly, when
>>>> I try to connect to one of the existing slaves I get a 403.
>>>>
>>>> -Whitney
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>>
>>>>> Hey Whitney,
>>>>>
>>>>> What version of mesos are you using (both in the cluster and the
>>>>> plugin)?
>>>>>
>>>>> The slave should print stuff to console when it is launching executor
>>>>> (e.g., "Fetching resources..."). I don't see that in the gist you pasted.
>>>>> Are you capturing stdout/stderr of the slave?
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>>> wsorenson@hubspot.com> wrote:
>>>>>
>>>>>> Thanks Ray.
>>>>>>
>>>>>> I have very similar issue (empty executor directories) - but don't
>>>>>> have any issues curling the slave.jar URI - and I don't have any existing
>>>>>> JNLP process running. I don't have a jenkins user - is that the only setup
>>>>>> you did on the slave?
>>>>>>
>>>>>> -Whitney
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Whitney I would have a look at this github issue where I work
>>>>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>>>>> of the same issues you are seeing.
>>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>>
>>>>>>> Ray
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>>
>>>>>>>> Hi all!
>>>>>>>>
>>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was able
>>>>>>>> to get it installed on our Jenkins master.
>>>>>>>>
>>>>>>>> However, it's unclear if there are any required steps for setting
>>>>>>>> up the slaves. When a framework task is launched, it fails instantly and
>>>>>>>> there are no logs in the runs folder.
>>>>>>>>
>>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>>
>>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>>
>>>>>>>> -Whitney
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Vinod Kone <vi...@gmail.com>.
I looked at the code and it looks there are few places the executor might
fail before it fetches the URI. Most of them have to do with incorrect
permissions. The code was written to have any errors reported either in
slave log or console or executor logs (there might be a bug here if we are
in fact swallowing errors). IIUC, the executor log directory is empty in
your case which suggests the executor died before it could even create
"stdout" or "stderr" files in its sandbox (Is this true?).

Couple of questions:

What user is Jenkins master running as? Is that user known to the host on
which mesos slave is running?

How are you starting the mesos slave (e.g., cmd line flags)?



On Thu, Nov 7, 2013 at 11:00 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> The gist was compiled from that log. Here is the complete log from
> toggling the jenkins plugin on / off (you see the ping statements
> inbetween):
>
> https://gist.github.com/wsorenson/8bf64e44fd42da354fa0
>
>
>
>
> On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com> wrote:
>
>> What does mesos-slave.err say?
>>
>>
>> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Hi Vinod,
>>>
>>> It's 0.14.0-rc4 in both.
>>>
>>> I believe we have logging working:
>>>
>>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>>
>>> Is there something else to check? Is it possible the executor is failing
>>> before it even attempts to fetch URIs?
>>>
>>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
>>> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>>> can't connect to their slave-agent - but the jar does run. Not sure if the
>>> window for trying to connect to one of the mesos launched slaves is long
>>> enough to try before it is terminated due to failures. Interestingly, when
>>> I try to connect to one of the existing slaves I get a 403.
>>>
>>> -Whitney
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com> wrote:
>>>
>>>> Hey Whitney,
>>>>
>>>> What version of mesos are you using (both in the cluster and the
>>>> plugin)?
>>>>
>>>> The slave should print stuff to console when it is launching executor
>>>> (e.g., "Fetching resources..."). I don't see that in the gist you pasted.
>>>> Are you capturing stdout/stderr of the slave?
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <
>>>> wsorenson@hubspot.com> wrote:
>>>>
>>>>> Thanks Ray.
>>>>>
>>>>> I have very similar issue (empty executor directories) - but don't
>>>>> have any issues curling the slave.jar URI - and I don't have any existing
>>>>> JNLP process running. I don't have a jenkins user - is that the only setup
>>>>> you did on the slave?
>>>>>
>>>>> -Whitney
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com>wrote:
>>>>>
>>>>>> Hi Whitney I would have a look at this github issue where I work
>>>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>>>> of the same issues you are seeing.
>>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>>
>>>>>> Ray
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>>> wsorenson@hubspot.com> wrote:
>>>>>>
>>>>>>> Hi all!
>>>>>>>
>>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was able
>>>>>>> to get it installed on our Jenkins master.
>>>>>>>
>>>>>>> However, it's unclear if there are any required steps for setting up
>>>>>>> the slaves. When a framework task is launched, it fails instantly and there
>>>>>>> are no logs in the runs folder.
>>>>>>>
>>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>>
>>>>>>>
>>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>>
>>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>>
>>>>>>> -Whitney
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
The gist was compiled from that log. Here is the complete log from toggling
the jenkins plugin on / off (you see the ping statements inbetween):

https://gist.github.com/wsorenson/8bf64e44fd42da354fa0




On Thu, Nov 7, 2013 at 1:57 PM, Vinod Kone <vi...@gmail.com> wrote:

> What does mesos-slave.err say?
>
>
> On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> Hi Vinod,
>>
>> It's 0.14.0-rc4 in both.
>>
>> I believe we have logging working:
>>
>> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
>> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
>> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
>> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
>> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
>> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
>> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>>
>> Is there something else to check? Is it possible the executor is failing
>> before it even attempts to fetch URIs?
>>
>> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
>> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
>> can't connect to their slave-agent - but the jar does run. Not sure if the
>> window for trying to connect to one of the mesos launched slaves is long
>> enough to try before it is terminated due to failures. Interestingly, when
>> I try to connect to one of the existing slaves I get a 403.
>>
>> -Whitney
>>
>>
>>
>> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com> wrote:
>>
>>> Hey Whitney,
>>>
>>> What version of mesos are you using (both in the cluster and the plugin)?
>>>
>>> The slave should print stuff to console when it is launching executor
>>> (e.g., "Fetching resources..."). I don't see that in the gist you pasted.
>>> Are you capturing stdout/stderr of the slave?
>>>
>>>
>>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <wsorenson@hubspot.com
>>> > wrote:
>>>
>>>> Thanks Ray.
>>>>
>>>> I have very similar issue (empty executor directories) - but don't have
>>>> any issues curling the slave.jar URI - and I don't have any existing JNLP
>>>> process running. I don't have a jenkins user - is that the only setup you
>>>> did on the slave?
>>>>
>>>> -Whitney
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com>wrote:
>>>>
>>>>> Hi Whitney I would have a look at this github issue where I work
>>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>>> of the same issues you are seeing.
>>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>>
>>>>> Ray
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <
>>>>> wsorenson@hubspot.com> wrote:
>>>>>
>>>>>> Hi all!
>>>>>>
>>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was able
>>>>>> to get it installed on our Jenkins master.
>>>>>>
>>>>>> However, it's unclear if there are any required steps for setting up
>>>>>> the slaves. When a framework task is launched, it fails instantly and there
>>>>>> are no logs in the runs folder.
>>>>>>
>>>>>> Here's a gist with relevant logs from the slave:
>>>>>>
>>>>>>
>>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>>
>>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>>> setup (as per readme.md) on any of the slaves.
>>>>>>
>>>>>> -Whitney
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Vinod Kone <vi...@gmail.com>.
What does mesos-slave.err say?


On Thu, Nov 7, 2013 at 10:49 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Hi Vinod,
>
> It's 0.14.0-rc4 in both.
>
> I believe we have logging working:
>
> -rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
> lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
> lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
> drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
> -rw-rw-r-- 1 root root      4827 Nov  1 20:34
> mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
> -rw-rw-r-- 1 root root  10408140 Nov  7 18:44
> mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
> -rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err
>
> Is there something else to check? Is it possible the executor is failing
> before it even attempts to fetch URIs?
>
> Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
> slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
> can't connect to their slave-agent - but the jar does run. Not sure if the
> window for trying to connect to one of the mesos launched slaves is long
> enough to try before it is terminated due to failures. Interestingly, when
> I try to connect to one of the existing slaves I get a 403.
>
> -Whitney
>
>
>
> On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com> wrote:
>
>> Hey Whitney,
>>
>> What version of mesos are you using (both in the cluster and the plugin)?
>>
>> The slave should print stuff to console when it is launching executor
>> (e.g., "Fetching resources..."). I don't see that in the gist you pasted.
>> Are you capturing stdout/stderr of the slave?
>>
>>
>> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Thanks Ray.
>>>
>>> I have very similar issue (empty executor directories) - but don't have
>>> any issues curling the slave.jar URI - and I don't have any existing JNLP
>>> process running. I don't have a jenkins user - is that the only setup you
>>> did on the slave?
>>>
>>> -Whitney
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com>wrote:
>>>
>>>> Hi Whitney I would have a look at this github issue where I work
>>>> through some of my jenkins mesos-plugin issues with Vinod.  Might be some
>>>> of the same issues you are seeing.
>>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>>
>>>> Ray
>>>>
>>>>
>>>>
>>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <wsorenson@hubspot.com
>>>> > wrote:
>>>>
>>>>> Hi all!
>>>>>
>>>>> I am trying to get the Jenkins Mesos plugin functioning. I was able to
>>>>> get it installed on our Jenkins master.
>>>>>
>>>>> However, it's unclear if there are any required steps for setting up
>>>>> the slaves. When a framework task is launched, it fails instantly and there
>>>>> are no logs in the runs folder.
>>>>>
>>>>> Here's a gist with relevant logs from the slave:
>>>>>
>>>>>
>>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>>
>>>>> Any help on how to debug? At first, I thought maybe we needed
>>>>> slave.jar or something but it looks like it's trying to fetch that from the
>>>>> master using the URIs. To clarify, I have done no special jenkins related
>>>>> setup (as per readme.md) on any of the slaves.
>>>>>
>>>>> -Whitney
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
Hi Vinod,

It's 0.14.0-rc4 in both.

I believe we have logging working:

-rw-r--r-- 1 root root         0 Oct 22 23:48 mesos-slave.out
lrwxrwxrwx 1 root root        63 Oct 22 23:48 mesos-slave.INFO ->
mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
lrwxrwxrwx 1 root root        66 Oct 22 23:49 mesos-slave.WARNING ->
mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
drwxr-xr-x 2 root root      4096 Oct 22 23:49 .
-rw-rw-r-- 1 root root      4827 Nov  1 20:34
mesos-slave.carousel.invalid-user.log.WARNING.20131022-234954.5797
-rw-rw-r-- 1 root root  10408140 Nov  7 18:44
mesos-slave.carousel.invalid-user.log.INFO.20131022-234823.5797
-rw-r--r-- 1 root root  53759705 Nov  7 18:45 mesos-slave.err

Is there something else to check? Is it possible the executor is failing
before it even attempts to fetch URIs?

Ray - Thanks - yeah I found the jenkins logs. I was able to wget the
slave.jar, and even run it. The mesos-jenkins slaves are dead now, so I
can't connect to their slave-agent - but the jar does run. Not sure if the
window for trying to connect to one of the mesos launched slaves is long
enough to try before it is terminated due to failures. Interestingly, when
I try to connect to one of the existing slaves I get a 403.

-Whitney



On Thu, Nov 7, 2013 at 1:34 PM, Vinod Kone <vi...@gmail.com> wrote:

> Hey Whitney,
>
> What version of mesos are you using (both in the cluster and the plugin)?
>
> The slave should print stuff to console when it is launching executor
> (e.g., "Fetching resources..."). I don't see that in the gist you pasted.
> Are you capturing stdout/stderr of the slave?
>
>
> On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> Thanks Ray.
>>
>> I have very similar issue (empty executor directories) - but don't have
>> any issues curling the slave.jar URI - and I don't have any existing JNLP
>> process running. I don't have a jenkins user - is that the only setup you
>> did on the slave?
>>
>> -Whitney
>>
>>
>>
>> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com>wrote:
>>
>>> Hi Whitney I would have a look at this github issue where I work through
>>> some of my jenkins mesos-plugin issues with Vinod.  Might be some of the
>>> same issues you are seeing.
>>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>>
>>> Ray
>>>
>>>
>>>
>>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>>
>>>> Hi all!
>>>>
>>>> I am trying to get the Jenkins Mesos plugin functioning. I was able to
>>>> get it installed on our Jenkins master.
>>>>
>>>> However, it's unclear if there are any required steps for setting up
>>>> the slaves. When a framework task is launched, it fails instantly and there
>>>> are no logs in the runs folder.
>>>>
>>>> Here's a gist with relevant logs from the slave:
>>>>
>>>>
>>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>>
>>>> Any help on how to debug? At first, I thought maybe we needed slave.jar
>>>> or something but it looks like it's trying to fetch that from the master
>>>> using the URIs. To clarify, I have done no special jenkins related setup
>>>> (as per readme.md) on any of the slaves.
>>>>
>>>> -Whitney
>>>>
>>>
>>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Vinod Kone <vi...@gmail.com>.
Hey Whitney,

What version of mesos are you using (both in the cluster and the plugin)?

The slave should print stuff to console when it is launching executor
(e.g., "Fetching resources..."). I don't see that in the gist you pasted.
Are you capturing stdout/stderr of the slave?


On Thu, Nov 7, 2013 at 10:30 AM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Thanks Ray.
>
> I have very similar issue (empty executor directories) - but don't have
> any issues curling the slave.jar URI - and I don't have any existing JNLP
> process running. I don't have a jenkins user - is that the only setup you
> did on the slave?
>
> -Whitney
>
>
>
> On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com>wrote:
>
>> Hi Whitney I would have a look at this github issue where I work through
>> some of my jenkins mesos-plugin issues with Vinod.  Might be some of the
>> same issues you are seeing.
>> https://github.com/jenkinsci/mesos-plugin/issues/2
>>
>> Ray
>>
>>
>>
>> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>>
>>> Hi all!
>>>
>>> I am trying to get the Jenkins Mesos plugin functioning. I was able to
>>> get it installed on our Jenkins master.
>>>
>>> However, it's unclear if there are any required steps for setting up the
>>> slaves. When a framework task is launched, it fails instantly and there are
>>> no logs in the runs folder.
>>>
>>> Here's a gist with relevant logs from the slave:
>>>
>>>
>>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>>
>>> Any help on how to debug? At first, I thought maybe we needed slave.jar
>>> or something but it looks like it's trying to fetch that from the master
>>> using the URIs. To clarify, I have done no special jenkins related setup
>>> (as per readme.md) on any of the slaves.
>>>
>>> -Whitney
>>>
>>
>>
>

Re: Jenkins mesos plugin failing

Posted by Whitney Sorenson <ws...@hubspot.com>.
Thanks Ray.

I have very similar issue (empty executor directories) - but don't have any
issues curling the slave.jar URI - and I don't have any existing JNLP
process running. I don't have a jenkins user - is that the only setup you
did on the slave?

-Whitney



On Thu, Nov 7, 2013 at 1:16 PM, Ray Rodriguez <ra...@gmail.com> wrote:

> Hi Whitney I would have a look at this github issue where I work through
> some of my jenkins mesos-plugin issues with Vinod.  Might be some of the
> same issues you are seeing.
> https://github.com/jenkinsci/mesos-plugin/issues/2
>
> Ray
>
>
>
> On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <ws...@hubspot.com>wrote:
>
>> Hi all!
>>
>> I am trying to get the Jenkins Mesos plugin functioning. I was able to
>> get it installed on our Jenkins master.
>>
>> However, it's unclear if there are any required steps for setting up the
>> slaves. When a framework task is launched, it fails instantly and there are
>> no logs in the runs folder.
>>
>> Here's a gist with relevant logs from the slave:
>>
>>
>> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>>
>> Any help on how to debug? At first, I thought maybe we needed slave.jar
>> or something but it looks like it's trying to fetch that from the master
>> using the URIs. To clarify, I have done no special jenkins related setup
>> (as per readme.md) on any of the slaves.
>>
>> -Whitney
>>
>
>

Re: Jenkins mesos plugin failing

Posted by Ray Rodriguez <ra...@gmail.com>.
Hi Whitney I would have a look at this github issue where I work through
some of my jenkins mesos-plugin issues with Vinod.  Might be some of the
same issues you are seeing.
https://github.com/jenkinsci/mesos-plugin/issues/2

Ray



On Thu, Nov 7, 2013 at 1:07 PM, Whitney Sorenson <ws...@hubspot.com>wrote:

> Hi all!
>
> I am trying to get the Jenkins Mesos plugin functioning. I was able to get
> it installed on our Jenkins master.
>
> However, it's unclear if there are any required steps for setting up the
> slaves. When a framework task is launched, it fails instantly and there are
> no logs in the runs folder.
>
> Here's a gist with relevant logs from the slave:
>
>
> https://gist.github.com/wsorenson/b3562c3e4a8992f9a46f/raw/ea5821c442d826456291330452208d8d7ac8418f/failing+jenkins+logs
>
> Any help on how to debug? At first, I thought maybe we needed slave.jar or
> something but it looks like it's trying to fetch that from the master using
> the URIs. To clarify, I have done no special jenkins related setup (as per
> readme.md) on any of the slaves.
>
> -Whitney
>