You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@aurora.apache.org by Krish <kr...@gmail.com> on 2016/02/27 19:16:50 UTC

Re: Stacktrace when running Apache Aurora

I couldn't complete my PoC before project before (got busy with other
work). Well, it is never too late and here's my update and issue.

I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora (v0.11.0)
running.
I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
got a protobuf field not set error - ExecutorInfo field.

I have a mesos agent running in docker container on coreos and it can
access the host docker just fine.
I have also put the docker login credentials file at the right location for
it to access the private docker registry.
I can manually trigger a docker pull and docker run without issues from the
slave (which is also reflected properly outside the slave container with
docker images and docker ps).

However, when I try to run an aurora job with hello-docker container, the
slave prints out the log that docker pull has failed; more specifically:
" failed to start: Failed to 'docker pull
private_repo.com:5000/krish/test:latest': exit status = exited with status
1 stderr = Error: image krish/test:latest not found"

My hunch is that when using docker run from aurora DSL, it does not read
the docker credentials file properly and hence fails. I can reproduce the
exact same error when I delete the credentials file from the slave and
trigger a pull.

Is the hunch right? If yes, is there a way to resolve this? Maybe source it
some way before the run command?



--
κρισhναν

On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org> wrote:

> (1) clusters.json is written by you, configuring the CLI client with
> instructions for what clusters are available and how to discover them.
>
> (2) That's expected - mesos only allows one active replica of a framework
> at a time, this signals which one is active.
>
> (3) The observer is essentially a web server that allows you to browse a
> task's sandbox directory and other information about it.  You will need to
> configure it to run on your worker/agent nodes for that functionality to
> work (it's linked from the scheduler web UI).
>
> (4) You could indeed implement that behavior externally.  There is a
> reason:
> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>
> (5) That is correct.  The scheduler exposes a thrift API that you would
> use (a REST API is coming, but ground has not yet been broken).  If you go
> this route, i suggest you skip the DSL and use the JSON task description
> format that is shipped over the API.  There's not good documentation on
> this, but we can help you through it and would be grateful for a writeup of
> your approach!
>
>
> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com> wrote:
>
>> Hi Folks,
>> Firstly, thanks for all the help. Am happy to report that I have set up
>> zk, mesos & aurora, & can work further towards my idea of having an
>> auto-scaling cluster.
>> I have some further questions about the work done so far & things I plan
>> to do:
>>
>>    1. Is the /etc/aurora/clusters.json file created by the scheduled or
>>    does it need to be handcrafted? I had to manually edit the file to get my
>>    `aurora job ...` cli to work.
>>
>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos &
>>    aurora in a docker container. Only 1 of them outputs '1' when I look at the
>>    framework_registered' field. Is this expected? How do I verify that they
>>    are working as a cluster?
>>
>>    3. From the documentation, I see that there is an observer that needs
>>    to be listening on port 1338. What is the observer socket & its purpose? I
>>    have aurora listening only on ports 8081 (http port) & 8083 (libprocess).
>>
>>    4. I read about the 'PENDING' field in aurora documentation, as Bill
>>    suggested, & realize that it just shows that a task is waiting for some
>>    reasons (for want of resources, in my case, as 0 slaves have registered). I
>>    was thinking of adding a hook to the pending state; say if a task is
>>    PENDING for 5 minutes for lack of resources in the cluster, then spin up a
>>    new machine. Is this the right approach to take? Does aurora provide
>>    reasons for why is a task in PENDING state?
>>
>>    => aurora job status testcluster/$USER/test/hello_world
>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>    Active tasks (1):
>>           Task role: ubuntu, env: test, name: hello_world, instance: 0,
>>    status:
>>    PENDING on None
>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>              events:
>>               2015-10-23 04:55:33 PENDING: None
>>    Inactive tasks (0):
>>
>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>    increase/decrease the number of instances in my cluster, then I need to
>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>    ...` command. Is this right?
>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>    this update?
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
>> wrote:
>>
>>> I suspect your error from `aurora job create ...` is due to the aurora
>>> config you're using referencing `/vagrant/hello_world.py` which does not
>>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>>> config you're using?
>>>
>>> Cheers,
>>>
>>> Joshua
>>>
>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Thanks, Zameer.
>>>>
>>>> I had to modify  /etc/aurora/clusters.json:
>>>> [
>>>>   {
>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>     "name": "testcluster",
>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>     "slave_root": "/var/lib/mesos",
>>>>     "slave_run_directory": "latest",
>>>>     "zk": "127.0.1.1"
>>>>   }
>>>> ]
>>>>
>>>> I have a hello_world.aurora in my home folder. However the following
>>>> command errors out:
>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>> ./hello_world.aurora
>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>> '/vagrant/hello_world.py'
>>>>
>>>> A job list does work:
>>>> ~$ aurora job list testcluster
>>>>  INFO] Retrieving jobs for role None
>>>>
>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>
>>>> Any pointers to documentation will be helpful.
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>> wrote:
>>>>
>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
>>>>> reconciliation
>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>> instead.
>>>>>
>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>>> aurora. :)
>>>>>>
>>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>>> Is there a location from where I can download the binaries for *.pex
>>>>>> or build them from scratch?
>>>>>>
>>>>>> root@dev:/# find . -name "*.pex"
>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>> ./home/ubuntu/.pex
>>>>>> ./root/.pex
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>> sidestepping the executor.
>>>>>>>
>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>
>>>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>>>> any pre-built binaries.  If you're on debian, we have official debs here:
>>>>>>> https://bintray.com/apache/aurora
>>>>>>> You can see how they're built here (and can build your own)
>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Stephen,
>>>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>>>> planning to containerize/dockerize it later.
>>>>>>>>
>>>>>>>> Can you please point me to the right documentation (or a pointer to
>>>>>>>> the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>> analyze code for this.
>>>>>>>>
>>>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>>>> the zip file nor anywhere in the built source code.
>>>>>>>>
>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>> would appreciate the help.
>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Krish,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>> on an Aurora master.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this helps a little,
>>>>>>>>>
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>> *To:* Bill Farner
>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>
>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>
>>>>>>>>> Bill/Stephen,
>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>
>>>>>>>>> I do not know what to specify for  -framework_authentication_file
>>>>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>>>>
>>>>>>>>> I am not using any authentication on Mesos master, do I still need
>>>>>>>>> the framework_authentication_file parameter?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>> -native_log_file_path=/db
>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>> ...
>>>>>>>>> ...
>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>> GuiceManagedCompon
>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>> deTimeZone
>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>> timezone Greenwich M
>>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>>>>>> doStart
>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>> vision errors:
>>>>>>>>>
>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>> Path cannot be null at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>
>>>>>>>>> 1 error
>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>
>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>> Path cannot be
>>>>>>>>> null
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>
>>>>>>>>> 1 error
>>>>>>>>>         at
>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>         at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job
>>>>>>>>>> to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>
>>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>
>>>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>>
>>>>>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>>>>>> config file functions:
>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>> 2. What happens when we want to dynamically change the config,
>>>>>>>>>>> say increasing the number of instances of a service required? Does aurora
>>>>>>>>>>> require a reboot then?
>>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>>
>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> See
>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>> for an example
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>> ...
>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>>>>>> Guice creation errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>>> set.
>>>>>>>>>>>>   at
>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>
>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>> either one (a
>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>
>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>
>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Zameer Manji
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Jake Farrell <jf...@apache.org>.
This can also be avoided by setting DOCKER_CONFIG as an os environment
variable.

The issue is caused when docker containers from private registry are pulled
on a mesos
agent due to mesos versions < .26 only supporting the v1 registries which
require the
.dockercfg config file. Docker 1.8+ uses $HOME/.docker/config.json to store
config.
Mesos .26 has fixed this issue in the universal containerizer puller, but
to workaround
this patch enabling a environment file in the mesos-agents systemd service
set with
DOCKER_CONFIG to say $HOME/.docker/ so the config.json can be picked up
correctly.

MESOS-2969, MESOS-3031 caused by docker/docker#12009


-Jake



On Thu, Mar 3, 2016 at 11:30 AM, Krish <kr...@gmail.com> wrote:

> Used rbt for the first time and some weird thing happened to the console,
> and it got submitted!
> https://reviews.apache.org/r/44341/
>
> Will sure keep the list posted with any new info. Thanks.
>
>
>
> --
> κρισhναν
>
> On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <wf...@apache.org> wrote:
>
>> Likely in an existing page, preferably wherever you think would have
>> saved you the trial and error!
>>
>> I look forward to the blog post, be sure to shoot a link here once it's
>> up!
>>
>> Thanks!
>>
>>
>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>
>>> Can you guide me how to do that? Should I start with a new page and then
>>> submit it or would you like that as an entry in some existing doc?
>>> That will be the short term (couple of hours)  item on my checklist.
>>>
>>> Actually, as I said before, I have in mind to blog about my entire
>>> design and implementation process - the how and the why of docker
>>> configuration, private docker repo setup, coreos cluster setup, and zk,
>>> mesos master, aurora containerisation and setup, along with their
>>> monitoring (have decided on bosun.org with cAdvisor). And a short guide
>>> as to how to run both containerized and non containerized jobs in
>>> production.
>>> I had to refer to a dozen and more sites and blogs and manuals and
>>> source to get so far; and got help from engineers in various mailing lists.
>>> A unified guide should be helpful, imho.
>>>
>>>
>>> On Thursday 3 March 2016, Bill Farner <wf...@apache.org> wrote:
>>>
>>>> Wow!  I'm glad you got it working!  To help the next poor soul trying
>>>> to do this, would you be willing to put up a doc patch on our end?
>>>>
>>>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>>>
>>>>> TLDR;
>>>>> Use only file with the name .dockercfg for docker credentials in mesos
>>>>> tasks!
>>>>>
>>>>> Long story:
>>>>> ---------------
>>>>> Holy smokescreens!
>>>>> This is for reporting & documenting purposes only, so that others
>>>>> don't have to pull their hair like I did for the past few evenings!
>>>>>
>>>>> A little background:
>>>>> I am running Ubuntu 14.04 on my system and docker stores its
>>>>> credentials in the ~/.docker/config.json as
>>>>> cat ~/.docker/config.json
>>>>> {
>>>>> "auths": {
>>>>> "repo.example.com:5000": {
>>>>> "auth": "<snip>",
>>>>> "email": "<snip>"
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> And I am doing all these experiments on a coreOS system which stores
>>>>> the credentials  in ~/.dockercfg as
>>>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>>>> {
>>>>>   "repo.example.com:5000": {
>>>>>     "auth": "<snip>",
>>>>>     "email": "<snip>"
>>>>>   }
>>>>> }
>>>>>
>>>>> Since my container was an Ubuntu 14.04 container (as was my local
>>>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>>>> slave task to read the docker credentials as I had stored it as
>>>>> ~/.docker/config.json.
>>>>> After parsing through (a lot of find's, grep's and regex matching)
>>>>> aurora, mesos, and thermos source code, I saw in
>>>>> mesos/src/docker/docker.cpp:
>>>>>
>>>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>>>> 1127   map<string, string> environment = os::environment();
>>>>> 1128
>>>>> 1129   environment["HOME"] = directory;
>>>>> 1130
>>>>>
>>>>> Changed the filename and the json content, changed the
>>>>> thermos_executor_resources, and bam, docker pull works!
>>>>>
>>>>> Well, the mesos documentation does say "To run an image from a private
>>>>> repository, one can include the URI pointing to a .dockercfg that contains
>>>>> login information." and I would have read it a dozen times!
>>>>> But I never thought that they literally meant '.dockercfg' as the name
>>>>> of the file!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I have got the docker config file copied into the sandbox using the
>>>>>> thermos_executor_resources flag; however docker is still not able to find
>>>>>> the credentials file for doing an appropriate pull of image from a private
>>>>>> repo.
>>>>>>
>>>>>> When I try to use the library/hello-world:latest image from public
>>>>>> docker repo to check if everything works fine without the credentials, I
>>>>>> encounter a different problem:
>>>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>> Error response from daemon: Cannot start container
>>>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>>
>>>>>> I was referring to this email for guidance on setting up a mesos
>>>>>> slave:
>>>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>>>>
>>>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>>>>> in launching the hello-world image.
>>>>>>
>>>>>> Am I missing out on checking any log files generated? I currently
>>>>>> refer to mesos-slave stdout and the sandbox stderr file.
>>>>>> Any configuration parameter I am missing for this to happen?
>>>>>>
>>>>>> Any pointers will be really helpful. Thanks in advance.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Continuing my earlier chain of thought, I found this in the mesos
>>>>>>> bug list:
>>>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>>>> from framework.
>>>>>>> How does one pass credentials using the framework? As it seems the
>>>>>>> .docker/config.json is not read from the slave.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I couldn't complete my PoC before project before (got busy with
>>>>>>>> other work). Well, it is never too late and here's my update and issue.
>>>>>>>>
>>>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>>>> (v0.11.0) running.
>>>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>>>
>>>>>>>> I have a mesos agent running in docker container on coreos and it
>>>>>>>> can access the host docker just fine.
>>>>>>>> I have also put the docker login credentials file at the right
>>>>>>>> location for it to access the private docker registry.
>>>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>>>> container with docker images and docker ps).
>>>>>>>>
>>>>>>>> However, when I try to run an aurora job with hello-docker
>>>>>>>> container, the slave prints out the log that docker pull has failed; more
>>>>>>>> specifically:
>>>>>>>> " failed to start: Failed to 'docker pull
>>>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited
>>>>>>>> with status 1 stderr = Error: image krish/test:latest not found"
>>>>>>>>
>>>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>>>>> the exact same error when I delete the credentials file from the slave and
>>>>>>>> trigger a pull.
>>>>>>>>
>>>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>>>> source it some way before the run command?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> (1) clusters.json is written by you, configuring the CLI client
>>>>>>>>> with instructions for what clusters are available and how to discover them.
>>>>>>>>>
>>>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>>>> framework at a time, this signals which one is active.
>>>>>>>>>
>>>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>>>
>>>>>>>>> (4) You could indeed implement that behavior externally.  There is
>>>>>>>>> a reason:
>>>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>>>
>>>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>>>>> for a writeup of your approach!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi Folks,
>>>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>>>>> auto-scaling cluster.
>>>>>>>>>> I have some further questions about the work done so far & things
>>>>>>>>>> I plan to do:
>>>>>>>>>>
>>>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>>>
>>>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>>>>    that they are working as a cluster?
>>>>>>>>>>
>>>>>>>>>>    3. From the documentation, I see that there is an observer
>>>>>>>>>>    that needs to be listening on port 1338. What is the observer socket & its
>>>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>>>>    (libprocess).
>>>>>>>>>>
>>>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation,
>>>>>>>>>>    as Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>>>
>>>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>>>    Active tasks (1):
>>>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>>>    instance: 0, status:
>>>>>>>>>>    PENDING on None
>>>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>>>              events:
>>>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>>>    Inactive tasks (0):
>>>>>>>>>>
>>>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I
>>>>>>>>>>    decide to increase/decrease the number of instances in my cluster, then I
>>>>>>>>>>    need to create/overwrite the concerned the .aurora and trigger the `aurora
>>>>>>>>>>    update ...` command. Is this right?
>>>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>>>    triggers this update?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>>>>> .aurora config you're using?
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Joshua
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>>>
>>>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>>>> [
>>>>>>>>>>>>   {
>>>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>>>   }
>>>>>>>>>>>> ]
>>>>>>>>>>>>
>>>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>>>> following command errors out:
>>>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>>>> Error loading configuration: [Errno 2] No such file or
>>>>>>>>>>>> directory: '/vagrant/hello_world.py'
>>>>>>>>>>>>
>>>>>>>>>>>> A job list does work:
>>>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>>>
>>>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>>>> same machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>>>>
>>>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <
>>>>>>>>>>>> zmanji@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0
>>>>>>>>>>>>> uses Mesos' task reconciliation
>>>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able
>>>>>>>>>>>>>> to run aurora. :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>>>>> system.
>>>>>>>>>>>>>> Is there a location from where I can download the binaries
>>>>>>>>>>>>>> for *.pex or build them from scratch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for working with the scheduler source code, it's a
>>>>>>>>>>>>>>> standard gradle project and we tend to use intellij.  Docs to help ramp on
>>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it
>>>>>>>>>>>>>>> won't have any pre-built binaries.  If you're on debian, we have official
>>>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora
>>>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of
>>>>>>>>>>>>>>> yet.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I know I am asking too many queries on a single thread
>>>>>>>>>>>>>>>> here, & would appreciate the help.
>>>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not struggle as
>>>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need
>>>>>>>>>>>>>>>>> the hello_world.aurora once your scheduler is up an
>>>>>>>>>>>>>>>>> running. It serves as an example input for the aurora command line client
>>>>>>>>>>>>>>>>> which can be used to scheduler jobs and services on an Aurora master.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according
>>>>>>>>>>>>>>>>> to timezone Greenwich M
>>>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated
>>>>>>>>>>>>>>>>> Universal Time
>>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision
>>>>>>>>>>>>>>>>> errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file
>>>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update.  When you change
>>>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of
>>>>>>>>>>>>>>>>>> your job to the new config.  You'll use this same flow for updating your
>>>>>>>>>>>>>>>>>> job's software as well as resizing the job.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should I depend on
>>>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and mesos-master running
>>>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run aurora with the
>>>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing hasn't helped me
>>>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A
>>>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi
>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>>
>>> Thumb typed mail
>>>
>>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.
Used rbt for the first time and some weird thing happened to the console,
and it got submitted!
https://reviews.apache.org/r/44341/

Will sure keep the list posted with any new info. Thanks.



--
κρισhναν

On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <wf...@apache.org> wrote:

> Likely in an existing page, preferably wherever you think would have saved
> you the trial and error!
>
> I look forward to the blog post, be sure to shoot a link here once it's up!
>
> Thanks!
>
>
> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>
>> Can you guide me how to do that? Should I start with a new page and then
>> submit it or would you like that as an entry in some existing doc?
>> That will be the short term (couple of hours)  item on my checklist.
>>
>> Actually, as I said before, I have in mind to blog about my entire design
>> and implementation process - the how and the why of docker configuration,
>> private docker repo setup, coreos cluster setup, and zk, mesos master,
>> aurora containerisation and setup, along with their monitoring (have
>> decided on bosun.org with cAdvisor). And a short guide as to how to run
>> both containerized and non containerized jobs in production.
>> I had to refer to a dozen and more sites and blogs and manuals and source
>> to get so far; and got help from engineers in various mailing lists.
>> A unified guide should be helpful, imho.
>>
>>
>> On Thursday 3 March 2016, Bill Farner <wf...@apache.org> wrote:
>>
>>> Wow!  I'm glad you got it working!  To help the next poor soul trying to
>>> do this, would you be willing to put up a doc patch on our end?
>>>
>>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>>
>>>> TLDR;
>>>> Use only file with the name .dockercfg for docker credentials in mesos
>>>> tasks!
>>>>
>>>> Long story:
>>>> ---------------
>>>> Holy smokescreens!
>>>> This is for reporting & documenting purposes only, so that others don't
>>>> have to pull their hair like I did for the past few evenings!
>>>>
>>>> A little background:
>>>> I am running Ubuntu 14.04 on my system and docker stores its
>>>> credentials in the ~/.docker/config.json as
>>>> cat ~/.docker/config.json
>>>> {
>>>> "auths": {
>>>> "repo.example.com:5000": {
>>>> "auth": "<snip>",
>>>> "email": "<snip>"
>>>> }
>>>> }
>>>> }
>>>>
>>>> And I am doing all these experiments on a coreOS system which stores
>>>> the credentials  in ~/.dockercfg as
>>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>>> {
>>>>   "repo.example.com:5000": {
>>>>     "auth": "<snip>",
>>>>     "email": "<snip>"
>>>>   }
>>>> }
>>>>
>>>> Since my container was an Ubuntu 14.04 container (as was my local
>>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>>> slave task to read the docker credentials as I had stored it as
>>>> ~/.docker/config.json.
>>>> After parsing through (a lot of find's, grep's and regex matching)
>>>> aurora, mesos, and thermos source code, I saw in
>>>> mesos/src/docker/docker.cpp:
>>>>
>>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>>> 1127   map<string, string> environment = os::environment();
>>>> 1128
>>>> 1129   environment["HOME"] = directory;
>>>> 1130
>>>>
>>>> Changed the filename and the json content, changed the
>>>> thermos_executor_resources, and bam, docker pull works!
>>>>
>>>> Well, the mesos documentation does say "To run an image from a private
>>>> repository, one can include the URI pointing to a .dockercfg that contains
>>>> login information." and I would have read it a dozen times!
>>>> But I never thought that they literally meant '.dockercfg' as the name
>>>> of the file!
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> I have got the docker config file copied into the sandbox using the
>>>>> thermos_executor_resources flag; however docker is still not able to find
>>>>> the credentials file for doing an appropriate pull of image from a private
>>>>> repo.
>>>>>
>>>>> When I try to use the library/hello-world:latest image from public
>>>>> docker repo to check if everything works fine without the credentials, I
>>>>> encounter a different problem:
>>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>> Error response from daemon: Cannot start container
>>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>
>>>>> I was referring to this email for guidance on setting up a mesos
>>>>> slave:
>>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>>>
>>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>>>> in launching the hello-world image.
>>>>>
>>>>> Am I missing out on checking any log files generated? I currently
>>>>> refer to mesos-slave stdout and the sandbox stderr file.
>>>>> Any configuration parameter I am missing for this to happen?
>>>>>
>>>>> Any pointers will be really helpful. Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>>>>> list:
>>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>>> from framework.
>>>>>> How does one pass credentials using the framework? As it seems the
>>>>>> .docker/config.json is not read from the slave.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I couldn't complete my PoC before project before (got busy with
>>>>>>> other work). Well, it is never too late and here's my update and issue.
>>>>>>>
>>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>>> (v0.11.0) running.
>>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>>
>>>>>>> I have a mesos agent running in docker container on coreos and it
>>>>>>> can access the host docker just fine.
>>>>>>> I have also put the docker login credentials file at the right
>>>>>>> location for it to access the private docker registry.
>>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>>> container with docker images and docker ps).
>>>>>>>
>>>>>>> However, when I try to run an aurora job with hello-docker
>>>>>>> container, the slave prints out the log that docker pull has failed; more
>>>>>>> specifically:
>>>>>>> " failed to start: Failed to 'docker pull
>>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>>>>
>>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>>>> the exact same error when I delete the credentials file from the slave and
>>>>>>> trigger a pull.
>>>>>>>
>>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>>> source it some way before the run command?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> (1) clusters.json is written by you, configuring the CLI client
>>>>>>>> with instructions for what clusters are available and how to discover them.
>>>>>>>>
>>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>>> framework at a time, this signals which one is active.
>>>>>>>>
>>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>>
>>>>>>>> (4) You could indeed implement that behavior externally.  There is
>>>>>>>> a reason:
>>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>>
>>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>>>> for a writeup of your approach!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Folks,
>>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>>>> auto-scaling cluster.
>>>>>>>>> I have some further questions about the work done so far & things
>>>>>>>>> I plan to do:
>>>>>>>>>
>>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>>
>>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>>>    that they are working as a cluster?
>>>>>>>>>
>>>>>>>>>    3. From the documentation, I see that there is an observer
>>>>>>>>>    that needs to be listening on port 1338. What is the observer socket & its
>>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>>>    (libprocess).
>>>>>>>>>
>>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation,
>>>>>>>>>    as Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>>
>>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>>    Active tasks (1):
>>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>>    instance: 0, status:
>>>>>>>>>    PENDING on None
>>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>>              events:
>>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>>    Inactive tasks (0):
>>>>>>>>>
>>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide
>>>>>>>>>    to increase/decrease the number of instances in my cluster, then I need to
>>>>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>>>>    ...` command. Is this right?
>>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>>    triggers this update?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>>>
>>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>>>> .aurora config you're using?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Joshua
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>>
>>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>>> [
>>>>>>>>>>>   {
>>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>>   }
>>>>>>>>>>> ]
>>>>>>>>>>>
>>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>>> following command errors out:
>>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>>> Error loading configuration: [Errno 2] No such file or
>>>>>>>>>>> directory: '/vagrant/hello_world.py'
>>>>>>>>>>>
>>>>>>>>>>> A job list does work:
>>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>>
>>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>>> same machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>>>
>>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <
>>>>>>>>>>> zmanji@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>>>>> Mesos' task reconciliation
>>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>>> instead.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able
>>>>>>>>>>>>> to run aurora. :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>>>> system.
>>>>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for working with the scheduler source code, it's a
>>>>>>>>>>>>>> standard gradle project and we tend to use intellij.  Docs to help ramp on
>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it
>>>>>>>>>>>>>> won't have any pre-built binaries.  If you're on debian, we have official
>>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora
>>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I know I am asking too many queries on a single thread here,
>>>>>>>>>>>>>>> & would appreciate the help.
>>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not struggle as
>>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated
>>>>>>>>>>>>>>>> Universal Time
>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision
>>>>>>>>>>>>>>>> errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file
>>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update.  When you change
>>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of
>>>>>>>>>>>>>>>>> your job to the new config.  You'll use this same flow for updating your
>>>>>>>>>>>>>>>>> job's software as well as resizing the job.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should I depend on
>>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and mesos-master running
>>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run aurora with the
>>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing hasn't helped me
>>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A
>>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>> --
>>
>> Thumb typed mail
>>
>>

Re: Stacktrace when running Apache Aurora

Posted by Bill Farner <wf...@apache.org>.
Likely in an existing page, preferably wherever you think would have saved
you the trial and error!

I look forward to the blog post, be sure to shoot a link here once it's up!

Thanks!

On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:

> Can you guide me how to do that? Should I start with a new page and then
> submit it or would you like that as an entry in some existing doc?
> That will be the short term (couple of hours)  item on my checklist.
>
> Actually, as I said before, I have in mind to blog about my entire design
> and implementation process - the how and the why of docker configuration,
> private docker repo setup, coreos cluster setup, and zk, mesos master,
> aurora containerisation and setup, along with their monitoring (have
> decided on bosun.org with cAdvisor). And a short guide as to how to run
> both containerized and non containerized jobs in production.
> I had to refer to a dozen and more sites and blogs and manuals and source
> to get so far; and got help from engineers in various mailing lists.
> A unified guide should be helpful, imho.
>
>
> On Thursday 3 March 2016, Bill Farner <wfarner@apache.org
> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>
>> Wow!  I'm glad you got it working!  To help the next poor soul trying to
>> do this, would you be willing to put up a doc patch on our end?
>>
>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>
>>> TLDR;
>>> Use only file with the name .dockercfg for docker credentials in mesos
>>> tasks!
>>>
>>> Long story:
>>> ---------------
>>> Holy smokescreens!
>>> This is for reporting & documenting purposes only, so that others don't
>>> have to pull their hair like I did for the past few evenings!
>>>
>>> A little background:
>>> I am running Ubuntu 14.04 on my system and docker stores its credentials
>>> in the ~/.docker/config.json as
>>> cat ~/.docker/config.json
>>> {
>>> "auths": {
>>> "repo.example.com:5000": {
>>> "auth": "<snip>",
>>> "email": "<snip>"
>>> }
>>> }
>>> }
>>>
>>> And I am doing all these experiments on a coreOS system which stores the
>>> credentials  in ~/.dockercfg as
>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>> {
>>>   "repo.example.com:5000": {
>>>     "auth": "<snip>",
>>>     "email": "<snip>"
>>>   }
>>> }
>>>
>>> Since my container was an Ubuntu 14.04 container (as was my local
>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>> slave task to read the docker credentials as I had stored it as
>>> ~/.docker/config.json.
>>> After parsing through (a lot of find's, grep's and regex matching)
>>> aurora, mesos, and thermos source code, I saw in
>>> mesos/src/docker/docker.cpp:
>>>
>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>> 1127   map<string, string> environment = os::environment();
>>> 1128
>>> 1129   environment["HOME"] = directory;
>>> 1130
>>>
>>> Changed the filename and the json content, changed the
>>> thermos_executor_resources, and bam, docker pull works!
>>>
>>> Well, the mesos documentation does say "To run an image from a private
>>> repository, one can include the URI pointing to a .dockercfg that contains
>>> login information." and I would have read it a dozen times!
>>> But I never thought that they literally meant '.dockercfg' as the name
>>> of the file!
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com> wrote:
>>>
>>>>
>>>> I have got the docker config file copied into the sandbox using the
>>>> thermos_executor_resources flag; however docker is still not able to find
>>>> the credentials file for doing an appropriate pull of image from a private
>>>> repo.
>>>>
>>>> When I try to use the library/hello-world:latest image from public
>>>> docker repo to check if everything works fine without the credentials, I
>>>> encounter a different problem:
>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>> Error response from daemon: Cannot start container
>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>
>>>> I was referring to this email for guidance on setting up a mesos slave:
>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>>
>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>>> in launching the hello-world image.
>>>>
>>>> Am I missing out on checking any log files generated? I currently refer
>>>> to mesos-slave stdout and the sandbox stderr file.
>>>> Any configuration parameter I am missing for this to happen?
>>>>
>>>> Any pointers will be really helpful. Thanks in advance.
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>>>> list:
>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>> from framework.
>>>>> How does one pass credentials using the framework? As it seems the
>>>>> .docker/config.json is not read from the slave.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I couldn't complete my PoC before project before (got busy with other
>>>>>> work). Well, it is never too late and here's my update and issue.
>>>>>>
>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>> (v0.11.0) running.
>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>
>>>>>> I have a mesos agent running in docker container on coreos and it can
>>>>>> access the host docker just fine.
>>>>>> I have also put the docker login credentials file at the right
>>>>>> location for it to access the private docker registry.
>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>> container with docker images and docker ps).
>>>>>>
>>>>>> However, when I try to run an aurora job with hello-docker container,
>>>>>> the slave prints out the log that docker pull has failed; more specifically:
>>>>>> " failed to start: Failed to 'docker pull
>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>>>
>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>>> the exact same error when I delete the credentials file from the slave and
>>>>>> trigger a pull.
>>>>>>
>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>> source it some way before the run command?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>>>>> instructions for what clusters are available and how to discover them.
>>>>>>>
>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>> framework at a time, this signals which one is active.
>>>>>>>
>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>
>>>>>>> (4) You could indeed implement that behavior externally.  There is a
>>>>>>> reason:
>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>
>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>>> for a writeup of your approach!
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Folks,
>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>>> auto-scaling cluster.
>>>>>>>> I have some further questions about the work done so far & things I
>>>>>>>> plan to do:
>>>>>>>>
>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>
>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>>    that they are working as a cluster?
>>>>>>>>
>>>>>>>>    3. From the documentation, I see that there is an observer that
>>>>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>>    (libprocess).
>>>>>>>>
>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>
>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>    Active tasks (1):
>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>    instance: 0, status:
>>>>>>>>    PENDING on None
>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>              events:
>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>    Inactive tasks (0):
>>>>>>>>
>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide
>>>>>>>>    to increase/decrease the number of instances in my cluster, then I need to
>>>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>>>    ...` command. Is this right?
>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>    triggers this update?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>>
>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>>> .aurora config you're using?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Joshua
>>>>>>>>>
>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>
>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>> [
>>>>>>>>>>   {
>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>   }
>>>>>>>>>> ]
>>>>>>>>>>
>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>> following command errors out:
>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>>>>> '/vagrant/hello_world.py'
>>>>>>>>>>
>>>>>>>>>> A job list does work:
>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>
>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>> same machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>>
>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zmanji@apache.org
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>>>> Mesos' task reconciliation
>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>> instead.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>>>>> run aurora. :)
>>>>>>>>>>>>
>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>>> system.
>>>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>>>
>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I know I am asking too many queries on a single thread here,
>>>>>>>>>>>>>> & would appreciate the help.
>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not struggle as
>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>>>>> Time
>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should I depend on
>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and mesos-master running
>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run aurora with the
>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing hasn't helped me
>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
> --
>
> Thumb typed mail
>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.
Can you guide me how to do that? Should I start with a new page and then
submit it or would you like that as an entry in some existing doc?
That will be the short term (couple of hours)  item on my checklist.

Actually, as I said before, I have in mind to blog about my entire design
and implementation process - the how and the why of docker configuration,
private docker repo setup, coreos cluster setup, and zk, mesos master,
aurora containerisation and setup, along with their monitoring (have
decided on bosun.org with cAdvisor). And a short guide as to how to run
both containerized and non containerized jobs in production.
I had to refer to a dozen and more sites and blogs and manuals and source
to get so far; and got help from engineers in various mailing lists.
A unified guide should be helpful, imho.


On Thursday 3 March 2016, Bill Farner <wf...@apache.org> wrote:

> Wow!  I'm glad you got it working!  To help the next poor soul trying to
> do this, would you be willing to put up a doc patch on our end?
>
> On Thursday, March 3, 2016, Krish <krishnan.k.iyer@gmail.com
> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>
>> TLDR;
>> Use only file with the name .dockercfg for docker credentials in mesos
>> tasks!
>>
>> Long story:
>> ---------------
>> Holy smokescreens!
>> This is for reporting & documenting purposes only, so that others don't
>> have to pull their hair like I did for the past few evenings!
>>
>> A little background:
>> I am running Ubuntu 14.04 on my system and docker stores its credentials
>> in the ~/.docker/config.json as
>> cat ~/.docker/config.json
>> {
>> "auths": {
>> "repo.example.com:5000": {
>> "auth": "<snip>",
>> "email": "<snip>"
>> }
>> }
>> }
>>
>> And I am doing all these experiments on a coreOS system which stores the
>> credentials  in ~/.dockercfg as
>> core@aurora-1 ~ $ cat ~/.dockercfg
>> {
>>   "repo.example.com:5000": {
>>     "auth": "<snip>",
>>     "email": "<snip>"
>>   }
>> }
>>
>> Since my container was an Ubuntu 14.04 container (as was my local
>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>> slave task to read the docker credentials as I had stored it as
>> ~/.docker/config.json.
>> After parsing through (a lot of find's, grep's and regex matching)
>> aurora, mesos, and thermos source code, I saw in
>> mesos/src/docker/docker.cpp:
>>
>> 1126   // Set HOME variable to pick up *.dockercfg*.
>> 1127   map<string, string> environment = os::environment();
>> 1128
>> 1129   environment["HOME"] = directory;
>> 1130
>>
>> Changed the filename and the json content, changed the
>> thermos_executor_resources, and bam, docker pull works!
>>
>> Well, the mesos documentation does say "To run an image from a private
>> repository, one can include the URI pointing to a .dockercfg that contains
>> login information." and I would have read it a dozen times!
>> But I never thought that they literally meant '.dockercfg' as the name of
>> the file!
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com> wrote:
>>
>>>
>>> I have got the docker config file copied into the sandbox using the
>>> thermos_executor_resources flag; however docker is still not able to find
>>> the credentials file for doing an appropriate pull of image from a private
>>> repo.
>>>
>>> When I try to use the library/hello-world:latest image from public
>>> docker repo to check if everything works fine without the credentials, I
>>> encounter a different problem:
>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>> Error response from daemon: Cannot start container
>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>
>>> I was referring to this email for guidance on setting up a mesos slave:
>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>
>>> So, I cannot get the credentials file to be used by docker, and if I
>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>> in launching the hello-world image.
>>>
>>> Am I missing out on checking any log files generated? I currently refer
>>> to mesos-slave stdout and the sandbox stderr file.
>>> Any configuration parameter I am missing for this to happen?
>>>
>>> Any pointers will be really helpful. Thanks in advance.
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>>> list:
>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>> from framework.
>>>> How does one pass credentials using the framework? As it seems the
>>>> .docker/config.json is not read from the slave.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> I couldn't complete my PoC before project before (got busy with other
>>>>> work). Well, it is never too late and here's my update and issue.
>>>>>
>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>> (v0.11.0) running.
>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0
>>>>> & got a protobuf field not set error - ExecutorInfo field.
>>>>>
>>>>> I have a mesos agent running in docker container on coreos and it can
>>>>> access the host docker just fine.
>>>>> I have also put the docker login credentials file at the right
>>>>> location for it to access the private docker registry.
>>>>> I can manually trigger a docker pull and docker run without issues
>>>>> from the slave (which is also reflected properly outside the slave
>>>>> container with docker images and docker ps).
>>>>>
>>>>> However, when I try to run an aurora job with hello-docker container,
>>>>> the slave prints out the log that docker pull has failed; more specifically:
>>>>> " failed to start: Failed to 'docker pull
>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>>
>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>> the exact same error when I delete the credentials file from the slave and
>>>>> trigger a pull.
>>>>>
>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>> source it some way before the run command?
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>>>> instructions for what clusters are available and how to discover them.
>>>>>>
>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>> framework at a time, this signals which one is active.
>>>>>>
>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>
>>>>>> (4) You could indeed implement that behavior externally.  There is a
>>>>>> reason:
>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>
>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>> description format that is shipped over the API.  There's not good
>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>> for a writeup of your approach!
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Folks,
>>>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>> auto-scaling cluster.
>>>>>>> I have some further questions about the work done so far & things I
>>>>>>> plan to do:
>>>>>>>
>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>
>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>    that they are working as a cluster?
>>>>>>>
>>>>>>>    3. From the documentation, I see that there is an observer that
>>>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>    (libprocess).
>>>>>>>
>>>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>
>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>    Active tasks (1):
>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>    instance: 0, status:
>>>>>>>    PENDING on None
>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>              events:
>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>    Inactive tasks (0):
>>>>>>>
>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide
>>>>>>>    to increase/decrease the number of instances in my cluster, then I need to
>>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>>    ...` command. Is this right?
>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>    triggers this update?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>
>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>> .aurora config you're using?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Joshua
>>>>>>>>
>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks, Zameer.
>>>>>>>>>
>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>> [
>>>>>>>>>   {
>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>     "name": "testcluster",
>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>   }
>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>> following command errors out:
>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>> ./hello_world.aurora
>>>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>>>> '/vagrant/hello_world.py'
>>>>>>>>>
>>>>>>>>> A job list does work:
>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>
>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>
>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>>> Mesos' task reconciliation
>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>> instead.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>>>> run aurora. :)
>>>>>>>>>>>
>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>> system.
>>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>>
>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>
>>>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>
>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed in
>>>>>>>>>>>>> a gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>>>> the thermos_executor_path command line flag of the
>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>>>> Time
>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend on aurora
>>>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside
>>>>>>>>>>>>>>>> a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Zameer Manji
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

-- 

Thumb typed mail

Re: Stacktrace when running Apache Aurora

Posted by Bill Farner <wf...@apache.org>.
Wow!  I'm glad you got it working!  To help the next poor soul trying to do
this, would you be willing to put up a doc patch on our end?

On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:

> TLDR;
> Use only file with the name .dockercfg for docker credentials in mesos
> tasks!
>
> Long story:
> ---------------
> Holy smokescreens!
> This is for reporting & documenting purposes only, so that others don't
> have to pull their hair like I did for the past few evenings!
>
> A little background:
> I am running Ubuntu 14.04 on my system and docker stores its credentials
> in the ~/.docker/config.json as
> cat ~/.docker/config.json
> {
> "auths": {
> "repo.example.com:5000": {
> "auth": "<snip>",
> "email": "<snip>"
> }
> }
> }
>
> And I am doing all these experiments on a coreOS system which stores the
> credentials  in ~/.dockercfg as
> core@aurora-1 ~ $ cat ~/.dockercfg
> {
>   "repo.example.com:5000": {
>     "auth": "<snip>",
>     "email": "<snip>"
>   }
> }
>
> Since my container was an Ubuntu 14.04 container (as was my local system),
> I used the ubuntu credential file format, i.e. I couldn't get the slave
> task to read the docker credentials as I had stored it as
> ~/.docker/config.json.
> After parsing through (a lot of find's, grep's and regex matching) aurora,
> mesos, and thermos source code, I saw in mesos/src/docker/docker.cpp:
>
> 1126   // Set HOME variable to pick up *.dockercfg*.
> 1127   map<string, string> environment = os::environment();
> 1128
> 1129   environment["HOME"] = directory;
> 1130
>
> Changed the filename and the json content, changed the
> thermos_executor_resources, and bam, docker pull works!
>
> Well, the mesos documentation does say "To run an image from a private
> repository, one can include the URI pointing to a .dockercfg that contains
> login information." and I would have read it a dozen times!
> But I never thought that they literally meant '.dockercfg' as the name of
> the file!
>
>
>
>
> --
> κρισhναν
>
> On Thu, Mar 3, 2016 at 1:45 PM, Krish <krishnan.k.iyer@gmail.com
> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>
>>
>> I have got the docker config file copied into the sandbox using the
>> thermos_executor_resources flag; however docker is still not able to find
>> the credentials file for doing an appropriate pull of image from a private
>> repo.
>>
>> When I try to use the library/hello-world:latest image from public docker
>> repo to check if everything works fine without the credentials, I encounter
>> a different problem:
>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>> Error response from daemon: Cannot start container
>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>
>> I was referring to this email for guidance on setting up a mesos slave:
>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>
>> So, I cannot get the credentials file to be used by docker, and if I
>> bypass authentication, I can do a docker pull, but encounter a weird error
>> in launching the hello-world image.
>>
>> Am I missing out on checking any log files generated? I currently refer
>> to mesos-slave stdout and the sandbox stderr file.
>> Any configuration parameter I am missing for this to happen?
>>
>> Any pointers will be really helpful. Thanks in advance.
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <krishnan.k.iyer@gmail.com
>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>
>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>> list:
>>> MESOS-4242 - Allow Docker private registry credentials to be passed from
>>> framework.
>>> How does one pass credentials using the framework? As it seems the
>>> .docker/config.json is not read from the slave.
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <krishnan.k.iyer@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>>
>>>> I couldn't complete my PoC before project before (got busy with other
>>>> work). Well, it is never too late and here's my update and issue.
>>>>
>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>> (v0.11.0) running.
>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0
>>>> & got a protobuf field not set error - ExecutorInfo field.
>>>>
>>>> I have a mesos agent running in docker container on coreos and it can
>>>> access the host docker just fine.
>>>> I have also put the docker login credentials file at the right location
>>>> for it to access the private docker registry.
>>>> I can manually trigger a docker pull and docker run without issues from
>>>> the slave (which is also reflected properly outside the slave container
>>>> with docker images and docker ps).
>>>>
>>>> However, when I try to run an aurora job with hello-docker container,
>>>> the slave prints out the log that docker pull has failed; more specifically:
>>>> " failed to start: Failed to 'docker pull
>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>
>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>> the exact same error when I delete the credentials file from the slave and
>>>> trigger a pull.
>>>>
>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>> source it some way before the run command?
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wfarner@apache.org
>>>> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>>>>
>>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>>> instructions for what clusters are available and how to discover them.
>>>>>
>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>> framework at a time, this signals which one is active.
>>>>>
>>>>> (3) The observer is essentially a web server that allows you to browse
>>>>> a task's sandbox directory and other information about it.  You will need
>>>>> to configure it to run on your worker/agent nodes for that functionality to
>>>>> work (it's linked from the scheduler web UI).
>>>>>
>>>>> (4) You could indeed implement that behavior externally.  There is a
>>>>> reason:
>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>
>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>> description format that is shipped over the API.  There's not good
>>>>> documentation on this, but we can help you through it and would be grateful
>>>>> for a writeup of your approach!
>>>>>
>>>>>
>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <krishnan.k.iyer@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>>>>
>>>>>> Hi Folks,
>>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>> auto-scaling cluster.
>>>>>> I have some further questions about the work done so far & things I
>>>>>> plan to do:
>>>>>>
>>>>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>>>    or does it need to be handcrafted? I had to manually edit the file to get
>>>>>>    my `aurora job ...` cli to work.
>>>>>>
>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>    that they are working as a cluster?
>>>>>>
>>>>>>    3. From the documentation, I see that there is an observer that
>>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>    (libprocess).
>>>>>>
>>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>
>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>    Active tasks (1):
>>>>>>           Task role: ubuntu, env: test, name: hello_world, instance:
>>>>>>    0, status:
>>>>>>    PENDING on None
>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>              events:
>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>    Inactive tasks (0):
>>>>>>
>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>>>    increase/decrease the number of instances in my cluster, then I need to
>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>    ...` command. Is this right?
>>>>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>>>>    this update?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>> jcohen@twopensource.com
>>>>>> <javascript:_e(%7B%7D,'cvml','jcohen@twopensource.com');>> wrote:
>>>>>>
>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>> .aurora config you're using?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Joshua
>>>>>>>
>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <krishnan.k.iyer@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>>>>>>
>>>>>>>> Thanks, Zameer.
>>>>>>>>
>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>> [
>>>>>>>>   {
>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>     "name": "testcluster",
>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>   }
>>>>>>>> ]
>>>>>>>>
>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>> following command errors out:
>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>> ./hello_world.aurora
>>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>>> '/vagrant/hello_world.py'
>>>>>>>>
>>>>>>>> A job list does work:
>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>
>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>
>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zmanji@apache.org
>>>>>>>> <javascript:_e(%7B%7D,'cvml','zmanji@apache.org');>> wrote:
>>>>>>>>
>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>> Mesos' task reconciliation
>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>> instead.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>>> run aurora. :)
>>>>>>>>>>
>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>> system.
>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>
>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>> ./root/.pex
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wfarner@apache.org
>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>
>>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>
>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com
>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Stephen,
>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>
>>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>>> I think at the end of this, I will put the steps I followed in
>>>>>>>>>>>> a gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>> Stephan.Erb@blue-yonder.com
>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','Stephan.Erb@blue-yonder.com');>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>> box).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>> *From:* Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>> *Cc:* user@aurora.apache.org
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>;
>>>>>>>>>>>>> Erb, Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>>> Time
>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>> null
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>> wfarner@apache.org
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com
>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend on aurora
>>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside
>>>>>>>>>>>>>>> a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com
>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','Stephan.Erb@blue-yonder.com');>
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>> *From:* Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>
>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Zameer Manji
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.
TLDR;
Use only file with the name .dockercfg for docker credentials in mesos
tasks!

Long story:
---------------
Holy smokescreens!
This is for reporting & documenting purposes only, so that others don't
have to pull their hair like I did for the past few evenings!

A little background:
I am running Ubuntu 14.04 on my system and docker stores its credentials in
the ~/.docker/config.json as
cat ~/.docker/config.json
{
"auths": {
"repo.example.com:5000": {
"auth": "<snip>",
"email": "<snip>"
}
}
}

And I am doing all these experiments on a coreOS system which stores the
credentials  in ~/.dockercfg as
core@aurora-1 ~ $ cat ~/.dockercfg
{
  "repo.example.com:5000": {
    "auth": "<snip>",
    "email": "<snip>"
  }
}

Since my container was an Ubuntu 14.04 container (as was my local system),
I used the ubuntu credential file format, i.e. I couldn't get the slave
task to read the docker credentials as I had stored it as
~/.docker/config.json.
After parsing through (a lot of find's, grep's and regex matching) aurora,
mesos, and thermos source code, I saw in mesos/src/docker/docker.cpp:

1126   // Set HOME variable to pick up *.dockercfg*.
1127   map<string, string> environment = os::environment();
1128
1129   environment["HOME"] = directory;
1130

Changed the filename and the json content, changed the
thermos_executor_resources, and bam, docker pull works!

Well, the mesos documentation does say "To run an image from a private
repository, one can include the URI pointing to a .dockercfg that contains
login information." and I would have read it a dozen times!
But I never thought that they literally meant '.dockercfg' as the name of
the file!




--
κρισhναν

On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com> wrote:

>
> I have got the docker config file copied into the sandbox using the
> thermos_executor_resources flag; however docker is still not able to find
> the credentials file for doing an appropriate pull of image from a private
> repo.
>
> When I try to use the library/hello-world:latest image from public docker
> repo to check if everything works fine without the credentials, I encounter
> a different problem:
> exec: "/bin/sh": stat /bin/sh: no such file or directory
> Error response from daemon: Cannot start container
> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>
> I was referring to this email for guidance on setting up a mesos slave:
> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>
> So, I cannot get the credentials file to be used by docker, and if I
> bypass authentication, I can do a docker pull, but encounter a weird error
> in launching the hello-world image.
>
> Am I missing out on checking any log files generated? I currently refer to
> mesos-slave stdout and the sandbox stderr file.
> Any configuration parameter I am missing for this to happen?
>
> Any pointers will be really helpful. Thanks in advance.
>
>
>
> --
> κρισhναν
>
> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com> wrote:
>
>> Continuing my earlier chain of thought, I found this in the mesos bug
>> list:
>> MESOS-4242 - Allow Docker private registry credentials to be passed from
>> framework.
>> How does one pass credentials using the framework? As it seems the
>> .docker/config.json is not read from the slave.
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>> wrote:
>>
>>> I couldn't complete my PoC before project before (got busy with other
>>> work). Well, it is never too late and here's my update and issue.
>>>
>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>> (v0.11.0) running.
>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
>>> got a protobuf field not set error - ExecutorInfo field.
>>>
>>> I have a mesos agent running in docker container on coreos and it can
>>> access the host docker just fine.
>>> I have also put the docker login credentials file at the right location
>>> for it to access the private docker registry.
>>> I can manually trigger a docker pull and docker run without issues from
>>> the slave (which is also reflected properly outside the slave container
>>> with docker images and docker ps).
>>>
>>> However, when I try to run an aurora job with hello-docker container,
>>> the slave prints out the log that docker pull has failed; more specifically:
>>> " failed to start: Failed to 'docker pull
>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>> status 1 stderr = Error: image krish/test:latest not found"
>>>
>>> My hunch is that when using docker run from aurora DSL, it does not read
>>> the docker credentials file properly and hence fails. I can reproduce the
>>> exact same error when I delete the credentials file from the slave and
>>> trigger a pull.
>>>
>>> Is the hunch right? If yes, is there a way to resolve this? Maybe source
>>> it some way before the run command?
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>> wrote:
>>>
>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>> instructions for what clusters are available and how to discover them.
>>>>
>>>> (2) That's expected - mesos only allows one active replica of a
>>>> framework at a time, this signals which one is active.
>>>>
>>>> (3) The observer is essentially a web server that allows you to browse
>>>> a task's sandbox directory and other information about it.  You will need
>>>> to configure it to run on your worker/agent nodes for that functionality to
>>>> work (it's linked from the scheduler web UI).
>>>>
>>>> (4) You could indeed implement that behavior externally.  There is a
>>>> reason:
>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>
>>>> (5) That is correct.  The scheduler exposes a thrift API that you would
>>>> use (a REST API is coming, but ground has not yet been broken).  If you go
>>>> this route, i suggest you skip the DSL and use the JSON task description
>>>> format that is shipped over the API.  There's not good documentation on
>>>> this, but we can help you through it and would be grateful for a writeup of
>>>> your approach!
>>>>
>>>>
>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Folks,
>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>> auto-scaling cluster.
>>>>> I have some further questions about the work done so far & things I
>>>>> plan to do:
>>>>>
>>>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>>    or does it need to be handcrafted? I had to manually edit the file to get
>>>>>    my `aurora job ...` cli to work.
>>>>>
>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>    that they are working as a cluster?
>>>>>
>>>>>    3. From the documentation, I see that there is an observer that
>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>    (libprocess).
>>>>>
>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>    provide reasons for why is a task in PENDING state?
>>>>>
>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>    Active tasks (1):
>>>>>           Task role: ubuntu, env: test, name: hello_world, instance:
>>>>>    0, status:
>>>>>    PENDING on None
>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>              events:
>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>    Inactive tasks (0):
>>>>>
>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>>    increase/decrease the number of instances in my cluster, then I need to
>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>    ...` command. Is this right?
>>>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>>>    this update?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jcohen@twopensource.com
>>>>> > wrote:
>>>>>
>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>> .aurora config you're using?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Joshua
>>>>>>
>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks, Zameer.
>>>>>>>
>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>> [
>>>>>>>   {
>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>     "name": "testcluster",
>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>     "slave_run_directory": "latest",
>>>>>>>     "zk": "127.0.1.1"
>>>>>>>   }
>>>>>>> ]
>>>>>>>
>>>>>>> I have a hello_world.aurora in my home folder. However the following
>>>>>>> command errors out:
>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>> ./hello_world.aurora
>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>> '/vagrant/hello_world.py'
>>>>>>>
>>>>>>> A job list does work:
>>>>>>> ~$ aurora job list testcluster
>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>
>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>
>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>> Mesos' task reconciliation
>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>> instead.
>>>>>>>>
>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>> run aurora. :)
>>>>>>>>>
>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>> system.
>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>
>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>> ./root/.pex
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>
>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>
>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Stephen,
>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>
>>>>>>>>>>> Can you please point me to the right documentation (or a pointer
>>>>>>>>>>> to the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>>>>> analyze code for this.
>>>>>>>>>>>
>>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>>
>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in
>>>>>>>>>>>> a checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>
>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>
>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>>
>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>
>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>> ...
>>>>>>>>>>>> ...
>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>> Time
>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>   while locating
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>
>>>>>>>>>>>> 1 error
>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>> null
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>   while locating
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>
>>>>>>>>>>>> 1 error
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>
>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend on aurora
>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only
>>>>>>>>>>>>>>> be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Zameer Manji
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.
I have got the docker config file copied into the sandbox using the
thermos_executor_resources flag; however docker is still not able to find
the credentials file for doing an appropriate pull of image from a private
repo.

When I try to use the library/hello-world:latest image from public docker
repo to check if everything works fine without the credentials, I encounter
a different problem:
exec: "/bin/sh": stat /bin/sh: no such file or directory
Error response from daemon: Cannot start container
de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
System error: exec: "/bin/sh": stat /bin/sh: no such file or directory

I was referring to this email for guidance on setting up a mesos slave:
http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E

So, I cannot get the credentials file to be used by docker, and if I bypass
authentication, I can do a docker pull, but encounter a weird error in
launching the hello-world image.

Am I missing out on checking any log files generated? I currently refer to
mesos-slave stdout and the sandbox stderr file.
Any configuration parameter I am missing for this to happen?

Any pointers will be really helpful. Thanks in advance.



--
κρισhναν

On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com> wrote:

> Continuing my earlier chain of thought, I found this in the mesos bug list:
> MESOS-4242 - Allow Docker private registry credentials to be passed from
> framework.
> How does one pass credentials using the framework? As it seems the
> .docker/config.json is not read from the slave.
>
>
>
>
> --
> κρισhναν
>
> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com> wrote:
>
>> I couldn't complete my PoC before project before (got busy with other
>> work). Well, it is never too late and here's my update and issue.
>>
>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>> (v0.11.0) running.
>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
>> got a protobuf field not set error - ExecutorInfo field.
>>
>> I have a mesos agent running in docker container on coreos and it can
>> access the host docker just fine.
>> I have also put the docker login credentials file at the right location
>> for it to access the private docker registry.
>> I can manually trigger a docker pull and docker run without issues from
>> the slave (which is also reflected properly outside the slave container
>> with docker images and docker ps).
>>
>> However, when I try to run an aurora job with hello-docker container, the
>> slave prints out the log that docker pull has failed; more specifically:
>> " failed to start: Failed to 'docker pull
>> private_repo.com:5000/krish/test:latest': exit status = exited with
>> status 1 stderr = Error: image krish/test:latest not found"
>>
>> My hunch is that when using docker run from aurora DSL, it does not read
>> the docker credentials file properly and hence fails. I can reproduce the
>> exact same error when I delete the credentials file from the slave and
>> trigger a pull.
>>
>> Is the hunch right? If yes, is there a way to resolve this? Maybe source
>> it some way before the run command?
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org> wrote:
>>
>>> (1) clusters.json is written by you, configuring the CLI client with
>>> instructions for what clusters are available and how to discover them.
>>>
>>> (2) That's expected - mesos only allows one active replica of a
>>> framework at a time, this signals which one is active.
>>>
>>> (3) The observer is essentially a web server that allows you to browse a
>>> task's sandbox directory and other information about it.  You will need to
>>> configure it to run on your worker/agent nodes for that functionality to
>>> work (it's linked from the scheduler web UI).
>>>
>>> (4) You could indeed implement that behavior externally.  There is a
>>> reason:
>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>
>>> (5) That is correct.  The scheduler exposes a thrift API that you would
>>> use (a REST API is coming, but ground has not yet been broken).  If you go
>>> this route, i suggest you skip the DSL and use the JSON task description
>>> format that is shipped over the API.  There's not good documentation on
>>> this, but we can help you through it and would be grateful for a writeup of
>>> your approach!
>>>
>>>
>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Hi Folks,
>>>> Firstly, thanks for all the help. Am happy to report that I have set up
>>>> zk, mesos & aurora, & can work further towards my idea of having an
>>>> auto-scaling cluster.
>>>> I have some further questions about the work done so far & things I
>>>> plan to do:
>>>>
>>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>    or does it need to be handcrafted? I had to manually edit the file to get
>>>>    my `aurora job ...` cli to work.
>>>>
>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos
>>>>    & aurora in a docker container. Only 1 of them outputs '1' when I look at
>>>>    the framework_registered' field. Is this expected? How do I verify that
>>>>    they are working as a cluster?
>>>>
>>>>    3. From the documentation, I see that there is an observer that
>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>    (libprocess).
>>>>
>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>    provide reasons for why is a task in PENDING state?
>>>>
>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>    Active tasks (1):
>>>>           Task role: ubuntu, env: test, name: hello_world, instance:
>>>>    0, status:
>>>>    PENDING on None
>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>              events:
>>>>               2015-10-23 04:55:33 PENDING: None
>>>>    Inactive tasks (0):
>>>>
>>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>    increase/decrease the number of instances in my cluster, then I need to
>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>    ...` command. Is this right?
>>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>>    this update?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
>>>> wrote:
>>>>
>>>>> I suspect your error from `aurora job create ...` is due to the aurora
>>>>> config you're using referencing `/vagrant/hello_world.py` which does not
>>>>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>>>>> config you're using?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Joshua
>>>>>
>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks, Zameer.
>>>>>>
>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>> [
>>>>>>   {
>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>     "name": "testcluster",
>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>     "slave_run_directory": "latest",
>>>>>>     "zk": "127.0.1.1"
>>>>>>   }
>>>>>> ]
>>>>>>
>>>>>> I have a hello_world.aurora in my home folder. However the following
>>>>>> command errors out:
>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>> ./hello_world.aurora
>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>> '/vagrant/hello_world.py'
>>>>>>
>>>>>> A job list does work:
>>>>>> ~$ aurora job list testcluster
>>>>>>  INFO] Retrieving jobs for role None
>>>>>>
>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>
>>>>>> Any pointers to documentation will be helpful.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>> Mesos' task reconciliation
>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>> instead.
>>>>>>>
>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>>>>> aurora. :)
>>>>>>>>
>>>>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>> *.pex or build them from scratch?
>>>>>>>>
>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>> ./home/ubuntu/.pex
>>>>>>>> ./root/.pex
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>>>> sidestepping the executor.
>>>>>>>>>
>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>
>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Stephen,
>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>
>>>>>>>>>> Can you please point me to the right documentation (or a pointer
>>>>>>>>>> to the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>>>> analyze code for this.
>>>>>>>>>>
>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>
>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>> would appreciate the help.
>>>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in
>>>>>>>>>>> a checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>
>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>
>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>
>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>> required arguments.
>>>>>>>>>>>
>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>> ...
>>>>>>>>>>> ...
>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>> deTimeZone
>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>> vision errors:
>>>>>>>>>>>
>>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>>> Path cannot be null at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>   while locating
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>
>>>>>>>>>>> 1 error
>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>
>>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>>> Path cannot be
>>>>>>>>>>> null
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>   while locating
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>
>>>>>>>>>>> 1 error
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wfarner@apache.org
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job
>>>>>>>>>>>> to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>
>>>>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>
>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>> 2. What happens when we want to dynamically change the config,
>>>>>>>>>>>>> say increasing the number of instances of a service required? Does aurora
>>>>>>>>>>>>> require a reboot then?
>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> See
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only
>>>>>>>>>>>>>> be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Zameer Manji
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.
Continuing my earlier chain of thought, I found this in the mesos bug list:
MESOS-4242 - Allow Docker private registry credentials to be passed from
framework.
How does one pass credentials using the framework? As it seems the
.docker/config.json is not read from the slave.




--
κρισhναν

On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com> wrote:

> I couldn't complete my PoC before project before (got busy with other
> work). Well, it is never too late and here's my update and issue.
>
> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
> (v0.11.0) running.
> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
> got a protobuf field not set error - ExecutorInfo field.
>
> I have a mesos agent running in docker container on coreos and it can
> access the host docker just fine.
> I have also put the docker login credentials file at the right location
> for it to access the private docker registry.
> I can manually trigger a docker pull and docker run without issues from
> the slave (which is also reflected properly outside the slave container
> with docker images and docker ps).
>
> However, when I try to run an aurora job with hello-docker container, the
> slave prints out the log that docker pull has failed; more specifically:
> " failed to start: Failed to 'docker pull
> private_repo.com:5000/krish/test:latest': exit status = exited with
> status 1 stderr = Error: image krish/test:latest not found"
>
> My hunch is that when using docker run from aurora DSL, it does not read
> the docker credentials file properly and hence fails. I can reproduce the
> exact same error when I delete the credentials file from the slave and
> trigger a pull.
>
> Is the hunch right? If yes, is there a way to resolve this? Maybe source
> it some way before the run command?
>
>
>
> --
> κρισhναν
>
> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org> wrote:
>
>> (1) clusters.json is written by you, configuring the CLI client with
>> instructions for what clusters are available and how to discover them.
>>
>> (2) That's expected - mesos only allows one active replica of a framework
>> at a time, this signals which one is active.
>>
>> (3) The observer is essentially a web server that allows you to browse a
>> task's sandbox directory and other information about it.  You will need to
>> configure it to run on your worker/agent nodes for that functionality to
>> work (it's linked from the scheduler web UI).
>>
>> (4) You could indeed implement that behavior externally.  There is a
>> reason:
>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>
>> (5) That is correct.  The scheduler exposes a thrift API that you would
>> use (a REST API is coming, but ground has not yet been broken).  If you go
>> this route, i suggest you skip the DSL and use the JSON task description
>> format that is shipped over the API.  There's not good documentation on
>> this, but we can help you through it and would be grateful for a writeup of
>> your approach!
>>
>>
>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>> wrote:
>>
>>> Hi Folks,
>>> Firstly, thanks for all the help. Am happy to report that I have set up
>>> zk, mesos & aurora, & can work further towards my idea of having an
>>> auto-scaling cluster.
>>> I have some further questions about the work done so far & things I plan
>>> to do:
>>>
>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled or
>>>    does it need to be handcrafted? I had to manually edit the file to get my
>>>    `aurora job ...` cli to work.
>>>
>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos
>>>    & aurora in a docker container. Only 1 of them outputs '1' when I look at
>>>    the framework_registered' field. Is this expected? How do I verify that
>>>    they are working as a cluster?
>>>
>>>    3. From the documentation, I see that there is an observer that
>>>    needs to be listening on port 1338. What is the observer socket & its
>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>    (libprocess).
>>>
>>>    4. I read about the 'PENDING' field in aurora documentation, as Bill
>>>    suggested, & realize that it just shows that a task is waiting for some
>>>    reasons (for want of resources, in my case, as 0 slaves have registered). I
>>>    was thinking of adding a hook to the pending state; say if a task is
>>>    PENDING for 5 minutes for lack of resources in the cluster, then spin up a
>>>    new machine. Is this the right approach to take? Does aurora provide
>>>    reasons for why is a task in PENDING state?
>>>
>>>    => aurora job status testcluster/$USER/test/hello_world
>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>    Active tasks (1):
>>>           Task role: ubuntu, env: test, name: hello_world, instance: 0,
>>>    status:
>>>    PENDING on None
>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>              events:
>>>               2015-10-23 04:55:33 PENDING: None
>>>    Inactive tasks (0):
>>>
>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>    increase/decrease the number of instances in my cluster, then I need to
>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>    ...` command. Is this right?
>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>    this update?
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
>>> wrote:
>>>
>>>> I suspect your error from `aurora job create ...` is due to the aurora
>>>> config you're using referencing `/vagrant/hello_world.py` which does not
>>>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>>>> config you're using?
>>>>
>>>> Cheers,
>>>>
>>>> Joshua
>>>>
>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks, Zameer.
>>>>>
>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>> [
>>>>>   {
>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>     "name": "testcluster",
>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>     "slave_root": "/var/lib/mesos",
>>>>>     "slave_run_directory": "latest",
>>>>>     "zk": "127.0.1.1"
>>>>>   }
>>>>> ]
>>>>>
>>>>> I have a hello_world.aurora in my home folder. However the following
>>>>> command errors out:
>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>> ./hello_world.aurora
>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>> '/vagrant/hello_world.py'
>>>>>
>>>>> A job list does work:
>>>>> ~$ aurora job list testcluster
>>>>>  INFO] Retrieving jobs for role None
>>>>>
>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>
>>>>> Any pointers to documentation will be helpful.
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>> Mesos' task reconciliation
>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>> instead.
>>>>>>
>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>>>> aurora. :)
>>>>>>>
>>>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>>>> Is there a location from where I can download the binaries for *.pex
>>>>>>> or build them from scratch?
>>>>>>>
>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>> ./home/ubuntu/.pex
>>>>>>> ./root/.pex
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>>> sidestepping the executor.
>>>>>>>>
>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>
>>>>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>>>>> any pre-built binaries.  If you're on debian, we have official debs here:
>>>>>>>> https://bintray.com/apache/aurora
>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Stephen,
>>>>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>>>>> planning to containerize/dockerize it later.
>>>>>>>>>
>>>>>>>>> Can you please point me to the right documentation (or a pointer
>>>>>>>>> to the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>>> analyze code for this.
>>>>>>>>>
>>>>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>>>>> the zip file nor anywhere in the built source code.
>>>>>>>>>
>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>> would appreciate the help.
>>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Krish,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>> on an Aurora master.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>
>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>
>>>>>>>>>> Bill/Stephen,
>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>
>>>>>>>>>> I do not know what to specify for  -framework_authentication_file
>>>>>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>>>>>
>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>> ...
>>>>>>>>>> ...
>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>> deTimeZone
>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>> timezone Greenwich M
>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>> vision errors:
>>>>>>>>>>
>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>> Path cannot be null at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>   while locating
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>
>>>>>>>>>> 1 error
>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>
>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>> Path cannot be
>>>>>>>>>> null
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>   while locating
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>
>>>>>>>>>> 1 error
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job
>>>>>>>>>>> to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>
>>>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>
>>>>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>>>
>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>> 2. What happens when we want to dynamically change the config,
>>>>>>>>>>>> say increasing the number of instances of a service required? Does aurora
>>>>>>>>>>>> require a reboot then?
>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>>>
>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> See
>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39​
>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>> set.
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only
>>>>>>>>>>>>> be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>
>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Zameer Manji
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>