You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@aurora.apache.org by Krish <kr...@gmail.com> on 2015/10/19 08:45:25 UTC

Re: Stacktrace when running Apache Aurora

Hi,
I am a n00b with apache aurora & trying to experiment some things on my
local machine with zookeeper and mesos-master running locally. They have
initialized properly. When I try to run aurora with the required options, I
get the following error, & googing hasn't helped me much here.
Appreciate any help. Thanks in advance.

...
...
WARNING: Method [public void
org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
is synthetic and is being intercepted by
[com.twitter.common.inject.TimedInterceptor@604c5de8]. This could indicate
a bug.  The method
 may be intercepted twice, or may not be intercepted at all.
Exception in thread "main" com.google.inject.CreationException: Guice
creation errors:

1) An exception was caught and reported. Message: A value may only be
retrieved from a variable that has a default or has been
set.
  at
com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)

2) Could not find a suitable constructor in
org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
either one (a
nd only one) constructor annotated with @Inject or a zero-argument
constructor that is not private.
  at
org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
  at
org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)

2 errors
        at
com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
        at
com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
        at
com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
        at com.google.inject.Guice.createInjector(Guice.java:95)
        at com.google.inject.Guice.createInjector(Guice.java:83)
        at
com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
        at
com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
        at
com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
        at
com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
        at
org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
Caused by: java.lang.IllegalStateException: A value may only be retrieved
from a variable that has a default or has been set.
        at
com.google.common.base.Preconditions.checkState(Preconditions.java:176)
        at com.twitter.common.args.Arg.get(Arg.java:82)
        at
org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
        at
com.google.inject.AbstractModule.configure(AbstractModule.java:59)
        at
com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
        at com.google.inject.util.Modules$2.configure(Modules.java:114)
        at
com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
        at com.google.inject.spi.Elements.getElements(Elements.java:101)
        at
com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
        at
com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
        ... 7 more

Complete logs are present @http://pastebin.com/i72HvbYi.



> --
> κρισhναν
>

Re: Stacktrace when running Apache Aurora

Posted by Jake Farrell <jf...@apache.org>.

This can also be avoided by setting DOCKER_CONFIG as an os environment
variable.

The issue is caused when docker containers from private registry are pulled
on a mesos
agent due to mesos versions < .26 only supporting the v1 registries which
require the
.dockercfg config file. Docker 1.8+ uses $HOME/.docker/config.json to store
config.
Mesos .26 has fixed this issue in the universal containerizer puller, but
to workaround
this patch enabling a environment file in the mesos-agents systemd service
set with
DOCKER_CONFIG to say $HOME/.docker/ so the config.json can be picked up
correctly.

MESOS-2969, MESOS-3031 caused by docker/docker#12009


-Jake



On Thu, Mar 3, 2016 at 11:30 AM, Krish <kr...@gmail.com> wrote:

> Used rbt for the first time and some weird thing happened to the console,
> and it got submitted!
> https://reviews.apache.org/r/44341/
>
> Will sure keep the list posted with any new info. Thanks.
>
>
>
> --
> κρισhναν
>
> On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <wf...@apache.org> wrote:
>
>> Likely in an existing page, preferably wherever you think would have
>> saved you the trial and error!
>>
>> I look forward to the blog post, be sure to shoot a link here once it's
>> up!
>>
>> Thanks!
>>
>>
>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>
>>> Can you guide me how to do that? Should I start with a new page and then
>>> submit it or would you like that as an entry in some existing doc?
>>> That will be the short term (couple of hours)  item on my checklist.
>>>
>>> Actually, as I said before, I have in mind to blog about my entire
>>> design and implementation process - the how and the why of docker
>>> configuration, private docker repo setup, coreos cluster setup, and zk,
>>> mesos master, aurora containerisation and setup, along with their
>>> monitoring (have decided on bosun.org with cAdvisor). And a short guide
>>> as to how to run both containerized and non containerized jobs in
>>> production.
>>> I had to refer to a dozen and more sites and blogs and manuals and
>>> source to get so far; and got help from engineers in various mailing lists.
>>> A unified guide should be helpful, imho.
>>>
>>>
>>> On Thursday 3 March 2016, Bill Farner <wf...@apache.org> wrote:
>>>
>>>> Wow!  I'm glad you got it working!  To help the next poor soul trying
>>>> to do this, would you be willing to put up a doc patch on our end?
>>>>
>>>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>>>
>>>>> TLDR;
>>>>> Use only file with the name .dockercfg for docker credentials in mesos
>>>>> tasks!
>>>>>
>>>>> Long story:
>>>>> ---------------
>>>>> Holy smokescreens!
>>>>> This is for reporting & documenting purposes only, so that others
>>>>> don't have to pull their hair like I did for the past few evenings!
>>>>>
>>>>> A little background:
>>>>> I am running Ubuntu 14.04 on my system and docker stores its
>>>>> credentials in the ~/.docker/config.json as
>>>>> cat ~/.docker/config.json
>>>>> {
>>>>> "auths": {
>>>>> "repo.example.com:5000": {
>>>>> "auth": "<snip>",
>>>>> "email": "<snip>"
>>>>> }
>>>>> }
>>>>> }
>>>>>
>>>>> And I am doing all these experiments on a coreOS system which stores
>>>>> the credentials  in ~/.dockercfg as
>>>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>>>> {
>>>>>   "repo.example.com:5000": {
>>>>>     "auth": "<snip>",
>>>>>     "email": "<snip>"
>>>>>   }
>>>>> }
>>>>>
>>>>> Since my container was an Ubuntu 14.04 container (as was my local
>>>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>>>> slave task to read the docker credentials as I had stored it as
>>>>> ~/.docker/config.json.
>>>>> After parsing through (a lot of find's, grep's and regex matching)
>>>>> aurora, mesos, and thermos source code, I saw in
>>>>> mesos/src/docker/docker.cpp:
>>>>>
>>>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>>>> 1127   map<string, string> environment = os::environment();
>>>>> 1128
>>>>> 1129   environment["HOME"] = directory;
>>>>> 1130
>>>>>
>>>>> Changed the filename and the json content, changed the
>>>>> thermos_executor_resources, and bam, docker pull works!
>>>>>
>>>>> Well, the mesos documentation does say "To run an image from a private
>>>>> repository, one can include the URI pointing to a .dockercfg that contains
>>>>> login information." and I would have read it a dozen times!
>>>>> But I never thought that they literally meant '.dockercfg' as the name
>>>>> of the file!
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> I have got the docker config file copied into the sandbox using the
>>>>>> thermos_executor_resources flag; however docker is still not able to find
>>>>>> the credentials file for doing an appropriate pull of image from a private
>>>>>> repo.
>>>>>>
>>>>>> When I try to use the library/hello-world:latest image from public
>>>>>> docker repo to check if everything works fine without the credentials, I
>>>>>> encounter a different problem:
>>>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>> Error response from daemon: Cannot start container
>>>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>>
>>>>>> I was referring to this email for guidance on setting up a mesos
>>>>>> slave:
>>>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>>>>
>>>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>>>>> in launching the hello-world image.
>>>>>>
>>>>>> Am I missing out on checking any log files generated? I currently
>>>>>> refer to mesos-slave stdout and the sandbox stderr file.
>>>>>> Any configuration parameter I am missing for this to happen?
>>>>>>
>>>>>> Any pointers will be really helpful. Thanks in advance.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Continuing my earlier chain of thought, I found this in the mesos
>>>>>>> bug list:
>>>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>>>> from framework.
>>>>>>> How does one pass credentials using the framework? As it seems the
>>>>>>> .docker/config.json is not read from the slave.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I couldn't complete my PoC before project before (got busy with
>>>>>>>> other work). Well, it is never too late and here's my update and issue.
>>>>>>>>
>>>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>>>> (v0.11.0) running.
>>>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>>>
>>>>>>>> I have a mesos agent running in docker container on coreos and it
>>>>>>>> can access the host docker just fine.
>>>>>>>> I have also put the docker login credentials file at the right
>>>>>>>> location for it to access the private docker registry.
>>>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>>>> container with docker images and docker ps).
>>>>>>>>
>>>>>>>> However, when I try to run an aurora job with hello-docker
>>>>>>>> container, the slave prints out the log that docker pull has failed; more
>>>>>>>> specifically:
>>>>>>>> " failed to start: Failed to 'docker pull
>>>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited
>>>>>>>> with status 1 stderr = Error: image krish/test:latest not found"
>>>>>>>>
>>>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>>>>> the exact same error when I delete the credentials file from the slave and
>>>>>>>> trigger a pull.
>>>>>>>>
>>>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>>>> source it some way before the run command?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> (1) clusters.json is written by you, configuring the CLI client
>>>>>>>>> with instructions for what clusters are available and how to discover them.
>>>>>>>>>
>>>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>>>> framework at a time, this signals which one is active.
>>>>>>>>>
>>>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>>>
>>>>>>>>> (4) You could indeed implement that behavior externally.  There is
>>>>>>>>> a reason:
>>>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>>>
>>>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>>>>> for a writeup of your approach!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi Folks,
>>>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>>>>> auto-scaling cluster.
>>>>>>>>>> I have some further questions about the work done so far & things
>>>>>>>>>> I plan to do:
>>>>>>>>>>
>>>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>>>
>>>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>>>>    that they are working as a cluster?
>>>>>>>>>>
>>>>>>>>>>    3. From the documentation, I see that there is an observer
>>>>>>>>>>    that needs to be listening on port 1338. What is the observer socket & its
>>>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>>>>    (libprocess).
>>>>>>>>>>
>>>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation,
>>>>>>>>>>    as Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>>>
>>>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>>>    Active tasks (1):
>>>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>>>    instance: 0, status:
>>>>>>>>>>    PENDING on None
>>>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>>>              events:
>>>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>>>    Inactive tasks (0):
>>>>>>>>>>
>>>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I
>>>>>>>>>>    decide to increase/decrease the number of instances in my cluster, then I
>>>>>>>>>>    need to create/overwrite the concerned the .aurora and trigger the `aurora
>>>>>>>>>>    update ...` command. Is this right?
>>>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>>>    triggers this update?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>>>>> .aurora config you're using?
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Joshua
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>>>
>>>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>>>> [
>>>>>>>>>>>>   {
>>>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>>>   }
>>>>>>>>>>>> ]
>>>>>>>>>>>>
>>>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>>>> following command errors out:
>>>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>>>> Error loading configuration: [Errno 2] No such file or
>>>>>>>>>>>> directory: '/vagrant/hello_world.py'
>>>>>>>>>>>>
>>>>>>>>>>>> A job list does work:
>>>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>>>
>>>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>>>> same machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>>>>
>>>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <
>>>>>>>>>>>> zmanji@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0
>>>>>>>>>>>>> uses Mesos' task reconciliation
>>>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able
>>>>>>>>>>>>>> to run aurora. :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>>>>> system.
>>>>>>>>>>>>>> Is there a location from where I can download the binaries
>>>>>>>>>>>>>> for *.pex or build them from scratch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for working with the scheduler source code, it's a
>>>>>>>>>>>>>>> standard gradle project and we tend to use intellij.  Docs to help ramp on
>>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it
>>>>>>>>>>>>>>> won't have any pre-built binaries.  If you're on debian, we have official
>>>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora
>>>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of
>>>>>>>>>>>>>>> yet.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I know I am asking too many queries on a single thread
>>>>>>>>>>>>>>>> here, & would appreciate the help.
>>>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not struggle as
>>>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need
>>>>>>>>>>>>>>>>> the hello_world.aurora once your scheduler is up an
>>>>>>>>>>>>>>>>> running. It serves as an example input for the aurora command line client
>>>>>>>>>>>>>>>>> which can be used to scheduler jobs and services on an Aurora master.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according
>>>>>>>>>>>>>>>>> to timezone Greenwich M
>>>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated
>>>>>>>>>>>>>>>>> Universal Time
>>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision
>>>>>>>>>>>>>>>>> errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file
>>>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update.  When you change
>>>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of
>>>>>>>>>>>>>>>>>> your job to the new config.  You'll use this same flow for updating your
>>>>>>>>>>>>>>>>>> job's software as well as resizing the job.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should I depend on
>>>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and mesos-master running
>>>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run aurora with the
>>>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing hasn't helped me
>>>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A
>>>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi
>>>>>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>> --
>>>
>>> Thumb typed mail
>>>
>>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Used rbt for the first time and some weird thing happened to the console,
and it got submitted!
https://reviews.apache.org/r/44341/

Will sure keep the list posted with any new info. Thanks.



--
κρισhναν

On Thu, Mar 3, 2016 at 9:20 PM, Bill Farner <wf...@apache.org> wrote:

> Likely in an existing page, preferably wherever you think would have saved
> you the trial and error!
>
> I look forward to the blog post, be sure to shoot a link here once it's up!
>
> Thanks!
>
>
> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>
>> Can you guide me how to do that? Should I start with a new page and then
>> submit it or would you like that as an entry in some existing doc?
>> That will be the short term (couple of hours)  item on my checklist.
>>
>> Actually, as I said before, I have in mind to blog about my entire design
>> and implementation process - the how and the why of docker configuration,
>> private docker repo setup, coreos cluster setup, and zk, mesos master,
>> aurora containerisation and setup, along with their monitoring (have
>> decided on bosun.org with cAdvisor). And a short guide as to how to run
>> both containerized and non containerized jobs in production.
>> I had to refer to a dozen and more sites and blogs and manuals and source
>> to get so far; and got help from engineers in various mailing lists.
>> A unified guide should be helpful, imho.
>>
>>
>> On Thursday 3 March 2016, Bill Farner <wf...@apache.org> wrote:
>>
>>> Wow!  I'm glad you got it working!  To help the next poor soul trying to
>>> do this, would you be willing to put up a doc patch on our end?
>>>
>>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>>
>>>> TLDR;
>>>> Use only file with the name .dockercfg for docker credentials in mesos
>>>> tasks!
>>>>
>>>> Long story:
>>>> ---------------
>>>> Holy smokescreens!
>>>> This is for reporting & documenting purposes only, so that others don't
>>>> have to pull their hair like I did for the past few evenings!
>>>>
>>>> A little background:
>>>> I am running Ubuntu 14.04 on my system and docker stores its
>>>> credentials in the ~/.docker/config.json as
>>>> cat ~/.docker/config.json
>>>> {
>>>> "auths": {
>>>> "repo.example.com:5000": {
>>>> "auth": "<snip>",
>>>> "email": "<snip>"
>>>> }
>>>> }
>>>> }
>>>>
>>>> And I am doing all these experiments on a coreOS system which stores
>>>> the credentials  in ~/.dockercfg as
>>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>>> {
>>>>   "repo.example.com:5000": {
>>>>     "auth": "<snip>",
>>>>     "email": "<snip>"
>>>>   }
>>>> }
>>>>
>>>> Since my container was an Ubuntu 14.04 container (as was my local
>>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>>> slave task to read the docker credentials as I had stored it as
>>>> ~/.docker/config.json.
>>>> After parsing through (a lot of find's, grep's and regex matching)
>>>> aurora, mesos, and thermos source code, I saw in
>>>> mesos/src/docker/docker.cpp:
>>>>
>>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>>> 1127   map<string, string> environment = os::environment();
>>>> 1128
>>>> 1129   environment["HOME"] = directory;
>>>> 1130
>>>>
>>>> Changed the filename and the json content, changed the
>>>> thermos_executor_resources, and bam, docker pull works!
>>>>
>>>> Well, the mesos documentation does say "To run an image from a private
>>>> repository, one can include the URI pointing to a .dockercfg that contains
>>>> login information." and I would have read it a dozen times!
>>>> But I never thought that they literally meant '.dockercfg' as the name
>>>> of the file!
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>> I have got the docker config file copied into the sandbox using the
>>>>> thermos_executor_resources flag; however docker is still not able to find
>>>>> the credentials file for doing an appropriate pull of image from a private
>>>>> repo.
>>>>>
>>>>> When I try to use the library/hello-world:latest image from public
>>>>> docker repo to check if everything works fine without the credentials, I
>>>>> encounter a different problem:
>>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>> Error response from daemon: Cannot start container
>>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>>
>>>>> I was referring to this email for guidance on setting up a mesos
>>>>> slave:
>>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>>>
>>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>>>> in launching the hello-world image.
>>>>>
>>>>> Am I missing out on checking any log files generated? I currently
>>>>> refer to mesos-slave stdout and the sandbox stderr file.
>>>>> Any configuration parameter I am missing for this to happen?
>>>>>
>>>>> Any pointers will be really helpful. Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>>>>> list:
>>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>>> from framework.
>>>>>> How does one pass credentials using the framework? As it seems the
>>>>>> .docker/config.json is not read from the slave.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> I couldn't complete my PoC before project before (got busy with
>>>>>>> other work). Well, it is never too late and here's my update and issue.
>>>>>>>
>>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>>> (v0.11.0) running.
>>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>>
>>>>>>> I have a mesos agent running in docker container on coreos and it
>>>>>>> can access the host docker just fine.
>>>>>>> I have also put the docker login credentials file at the right
>>>>>>> location for it to access the private docker registry.
>>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>>> container with docker images and docker ps).
>>>>>>>
>>>>>>> However, when I try to run an aurora job with hello-docker
>>>>>>> container, the slave prints out the log that docker pull has failed; more
>>>>>>> specifically:
>>>>>>> " failed to start: Failed to 'docker pull
>>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>>>>
>>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>>>> the exact same error when I delete the credentials file from the slave and
>>>>>>> trigger a pull.
>>>>>>>
>>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>>> source it some way before the run command?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> (1) clusters.json is written by you, configuring the CLI client
>>>>>>>> with instructions for what clusters are available and how to discover them.
>>>>>>>>
>>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>>> framework at a time, this signals which one is active.
>>>>>>>>
>>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>>
>>>>>>>> (4) You could indeed implement that behavior externally.  There is
>>>>>>>> a reason:
>>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>>
>>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>>>> for a writeup of your approach!
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Folks,
>>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>>>> auto-scaling cluster.
>>>>>>>>> I have some further questions about the work done so far & things
>>>>>>>>> I plan to do:
>>>>>>>>>
>>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>>
>>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>>>    that they are working as a cluster?
>>>>>>>>>
>>>>>>>>>    3. From the documentation, I see that there is an observer
>>>>>>>>>    that needs to be listening on port 1338. What is the observer socket & its
>>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>>>    (libprocess).
>>>>>>>>>
>>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation,
>>>>>>>>>    as Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>>
>>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>>    Active tasks (1):
>>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>>    instance: 0, status:
>>>>>>>>>    PENDING on None
>>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>>              events:
>>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>>    Inactive tasks (0):
>>>>>>>>>
>>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide
>>>>>>>>>    to increase/decrease the number of instances in my cluster, then I need to
>>>>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>>>>    ...` command. Is this right?
>>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>>    triggers this update?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>>>
>>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>>>> .aurora config you're using?
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Joshua
>>>>>>>>>>
>>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>>
>>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>>> [
>>>>>>>>>>>   {
>>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>>   }
>>>>>>>>>>> ]
>>>>>>>>>>>
>>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>>> following command errors out:
>>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>>> Error loading configuration: [Errno 2] No such file or
>>>>>>>>>>> directory: '/vagrant/hello_world.py'
>>>>>>>>>>>
>>>>>>>>>>> A job list does work:
>>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>>
>>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>>> same machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>>>
>>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <
>>>>>>>>>>> zmanji@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>>>>> Mesos' task reconciliation
>>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>>> instead.
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able
>>>>>>>>>>>>> to run aurora. :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>>>> system.
>>>>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for working with the scheduler source code, it's a
>>>>>>>>>>>>>> standard gradle project and we tend to use intellij.  Docs to help ramp on
>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it
>>>>>>>>>>>>>> won't have any pre-built binaries.  If you're on debian, we have official
>>>>>>>>>>>>>> debs here: https://bintray.com/apache/aurora
>>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I know I am asking too many queries on a single thread here,
>>>>>>>>>>>>>>> & would appreciate the help.
>>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not struggle as
>>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated
>>>>>>>>>>>>>>>> Universal Time
>>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision
>>>>>>>>>>>>>>>> errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file
>>>>>>>>>>>>>>>>> checked into git, and commit every time you deploy/update.  When you change
>>>>>>>>>>>>>>>>> your file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of
>>>>>>>>>>>>>>>>> your job to the new config.  You'll use this same flow for updating your
>>>>>>>>>>>>>>>>> job's software as well as resizing the job.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should I depend on
>>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and mesos-master running
>>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run aurora with the
>>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing hasn't helped me
>>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A
>>>>>>>>>>>>>>>>>>> value may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>> --
>>
>> Thumb typed mail
>>
>>

Re: Stacktrace when running Apache Aurora

Posted by Bill Farner <wf...@apache.org>.

Likely in an existing page, preferably wherever you think would have saved
you the trial and error!

I look forward to the blog post, be sure to shoot a link here once it's up!

Thanks!

On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:

> Can you guide me how to do that? Should I start with a new page and then
> submit it or would you like that as an entry in some existing doc?
> That will be the short term (couple of hours)  item on my checklist.
>
> Actually, as I said before, I have in mind to blog about my entire design
> and implementation process - the how and the why of docker configuration,
> private docker repo setup, coreos cluster setup, and zk, mesos master,
> aurora containerisation and setup, along with their monitoring (have
> decided on bosun.org with cAdvisor). And a short guide as to how to run
> both containerized and non containerized jobs in production.
> I had to refer to a dozen and more sites and blogs and manuals and source
> to get so far; and got help from engineers in various mailing lists.
> A unified guide should be helpful, imho.
>
>
> On Thursday 3 March 2016, Bill Farner <wfarner@apache.org
> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>
>> Wow!  I'm glad you got it working!  To help the next poor soul trying to
>> do this, would you be willing to put up a doc patch on our end?
>>
>> On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:
>>
>>> TLDR;
>>> Use only file with the name .dockercfg for docker credentials in mesos
>>> tasks!
>>>
>>> Long story:
>>> ---------------
>>> Holy smokescreens!
>>> This is for reporting & documenting purposes only, so that others don't
>>> have to pull their hair like I did for the past few evenings!
>>>
>>> A little background:
>>> I am running Ubuntu 14.04 on my system and docker stores its credentials
>>> in the ~/.docker/config.json as
>>> cat ~/.docker/config.json
>>> {
>>> "auths": {
>>> "repo.example.com:5000": {
>>> "auth": "<snip>",
>>> "email": "<snip>"
>>> }
>>> }
>>> }
>>>
>>> And I am doing all these experiments on a coreOS system which stores the
>>> credentials  in ~/.dockercfg as
>>> core@aurora-1 ~ $ cat ~/.dockercfg
>>> {
>>>   "repo.example.com:5000": {
>>>     "auth": "<snip>",
>>>     "email": "<snip>"
>>>   }
>>> }
>>>
>>> Since my container was an Ubuntu 14.04 container (as was my local
>>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>>> slave task to read the docker credentials as I had stored it as
>>> ~/.docker/config.json.
>>> After parsing through (a lot of find's, grep's and regex matching)
>>> aurora, mesos, and thermos source code, I saw in
>>> mesos/src/docker/docker.cpp:
>>>
>>> 1126   // Set HOME variable to pick up *.dockercfg*.
>>> 1127   map<string, string> environment = os::environment();
>>> 1128
>>> 1129   environment["HOME"] = directory;
>>> 1130
>>>
>>> Changed the filename and the json content, changed the
>>> thermos_executor_resources, and bam, docker pull works!
>>>
>>> Well, the mesos documentation does say "To run an image from a private
>>> repository, one can include the URI pointing to a .dockercfg that contains
>>> login information." and I would have read it a dozen times!
>>> But I never thought that they literally meant '.dockercfg' as the name
>>> of the file!
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com> wrote:
>>>
>>>>
>>>> I have got the docker config file copied into the sandbox using the
>>>> thermos_executor_resources flag; however docker is still not able to find
>>>> the credentials file for doing an appropriate pull of image from a private
>>>> repo.
>>>>
>>>> When I try to use the library/hello-world:latest image from public
>>>> docker repo to check if everything works fine without the credentials, I
>>>> encounter a different problem:
>>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>> Error response from daemon: Cannot start container
>>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>>
>>>> I was referring to this email for guidance on setting up a mesos slave:
>>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>>
>>>> So, I cannot get the credentials file to be used by docker, and if I
>>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>>> in launching the hello-world image.
>>>>
>>>> Am I missing out on checking any log files generated? I currently refer
>>>> to mesos-slave stdout and the sandbox stderr file.
>>>> Any configuration parameter I am missing for this to happen?
>>>>
>>>> Any pointers will be really helpful. Thanks in advance.
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>>>> list:
>>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>>> from framework.
>>>>> How does one pass credentials using the framework? As it seems the
>>>>> .docker/config.json is not read from the slave.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I couldn't complete my PoC before project before (got busy with other
>>>>>> work). Well, it is never too late and here's my update and issue.
>>>>>>
>>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>>> (v0.11.0) running.
>>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora
>>>>>> 0.9.0 & got a protobuf field not set error - ExecutorInfo field.
>>>>>>
>>>>>> I have a mesos agent running in docker container on coreos and it can
>>>>>> access the host docker just fine.
>>>>>> I have also put the docker login credentials file at the right
>>>>>> location for it to access the private docker registry.
>>>>>> I can manually trigger a docker pull and docker run without issues
>>>>>> from the slave (which is also reflected properly outside the slave
>>>>>> container with docker images and docker ps).
>>>>>>
>>>>>> However, when I try to run an aurora job with hello-docker container,
>>>>>> the slave prints out the log that docker pull has failed; more specifically:
>>>>>> " failed to start: Failed to 'docker pull
>>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>>>
>>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>>> the exact same error when I delete the credentials file from the slave and
>>>>>> trigger a pull.
>>>>>>
>>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>>> source it some way before the run command?
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>>>>> instructions for what clusters are available and how to discover them.
>>>>>>>
>>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>>> framework at a time, this signals which one is active.
>>>>>>>
>>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>>
>>>>>>> (4) You could indeed implement that behavior externally.  There is a
>>>>>>> reason:
>>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>>
>>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>>> description format that is shipped over the API.  There's not good
>>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>>> for a writeup of your approach!
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Folks,
>>>>>>>> Firstly, thanks for all the help. Am happy to report that I have
>>>>>>>> set up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>>> auto-scaling cluster.
>>>>>>>> I have some further questions about the work done so far & things I
>>>>>>>> plan to do:
>>>>>>>>
>>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>>
>>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>>    that they are working as a cluster?
>>>>>>>>
>>>>>>>>    3. From the documentation, I see that there is an observer that
>>>>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>>    (libprocess).
>>>>>>>>
>>>>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>>
>>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>>    Active tasks (1):
>>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>>    instance: 0, status:
>>>>>>>>    PENDING on None
>>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>>              events:
>>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>>    Inactive tasks (0):
>>>>>>>>
>>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide
>>>>>>>>    to increase/decrease the number of instances in my cluster, then I need to
>>>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>>>    ...` command. Is this right?
>>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>>    triggers this update?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>>
>>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>>> .aurora config you're using?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Joshua
>>>>>>>>>
>>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks, Zameer.
>>>>>>>>>>
>>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>>> [
>>>>>>>>>>   {
>>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>>     "name": "testcluster",
>>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>>   }
>>>>>>>>>> ]
>>>>>>>>>>
>>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>>> following command errors out:
>>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>>> ./hello_world.aurora
>>>>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>>>>> '/vagrant/hello_world.py'
>>>>>>>>>>
>>>>>>>>>> A job list does work:
>>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>>
>>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the
>>>>>>>>>> same machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>>
>>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zmanji@apache.org
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>>>> Mesos' task reconciliation
>>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>>> instead.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>>>>> run aurora. :)
>>>>>>>>>>>>
>>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>>> system.
>>>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>>>
>>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I know I am asking too many queries on a single thread here,
>>>>>>>>>>>>>> & would appreciate the help.
>>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed
>>>>>>>>>>>>>> in a gist/blog so others might find their way around, & not struggle as
>>>>>>>>>>>>>> much.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage
>>>>>>>>>>>>>>> of the thermos_executor_path command line flag of the
>>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I
>>>>>>>>>>>>>>> still need the framework_authentication_file parameter?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization
>>>>>>>>>>>>>>> to GuiceManagedCompon
>>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>>>>> Time
>>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends
>>>>>>>>>>>>>>>>> when it cannot schedule tasks for lack of resources? Should I depend on
>>>>>>>>>>>>>>>>> aurora for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task
>>>>>>>>>>>>>>>>> inside a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment
>>>>>>>>>>>>>>>>>> some things on my local machine with zookeeper and mesos-master running
>>>>>>>>>>>>>>>>>> locally. They have initialized properly. When I try to run aurora with the
>>>>>>>>>>>>>>>>>> required options, I get the following error, & googing hasn't helped me
>>>>>>>>>>>>>>>>>> much here.
>>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Zameer Manji
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>
> --
>
> Thumb typed mail
>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Can you guide me how to do that? Should I start with a new page and then
submit it or would you like that as an entry in some existing doc?
That will be the short term (couple of hours)  item on my checklist.

Actually, as I said before, I have in mind to blog about my entire design
and implementation process - the how and the why of docker configuration,
private docker repo setup, coreos cluster setup, and zk, mesos master,
aurora containerisation and setup, along with their monitoring (have
decided on bosun.org with cAdvisor). And a short guide as to how to run
both containerized and non containerized jobs in production.
I had to refer to a dozen and more sites and blogs and manuals and source
to get so far; and got help from engineers in various mailing lists.
A unified guide should be helpful, imho.


On Thursday 3 March 2016, Bill Farner <wf...@apache.org> wrote:

> Wow!  I'm glad you got it working!  To help the next poor soul trying to
> do this, would you be willing to put up a doc patch on our end?
>
> On Thursday, March 3, 2016, Krish <krishnan.k.iyer@gmail.com
> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>
>> TLDR;
>> Use only file with the name .dockercfg for docker credentials in mesos
>> tasks!
>>
>> Long story:
>> ---------------
>> Holy smokescreens!
>> This is for reporting & documenting purposes only, so that others don't
>> have to pull their hair like I did for the past few evenings!
>>
>> A little background:
>> I am running Ubuntu 14.04 on my system and docker stores its credentials
>> in the ~/.docker/config.json as
>> cat ~/.docker/config.json
>> {
>> "auths": {
>> "repo.example.com:5000": {
>> "auth": "<snip>",
>> "email": "<snip>"
>> }
>> }
>> }
>>
>> And I am doing all these experiments on a coreOS system which stores the
>> credentials  in ~/.dockercfg as
>> core@aurora-1 ~ $ cat ~/.dockercfg
>> {
>>   "repo.example.com:5000": {
>>     "auth": "<snip>",
>>     "email": "<snip>"
>>   }
>> }
>>
>> Since my container was an Ubuntu 14.04 container (as was my local
>> system), I used the ubuntu credential file format, i.e. I couldn't get the
>> slave task to read the docker credentials as I had stored it as
>> ~/.docker/config.json.
>> After parsing through (a lot of find's, grep's and regex matching)
>> aurora, mesos, and thermos source code, I saw in
>> mesos/src/docker/docker.cpp:
>>
>> 1126   // Set HOME variable to pick up *.dockercfg*.
>> 1127   map<string, string> environment = os::environment();
>> 1128
>> 1129   environment["HOME"] = directory;
>> 1130
>>
>> Changed the filename and the json content, changed the
>> thermos_executor_resources, and bam, docker pull works!
>>
>> Well, the mesos documentation does say "To run an image from a private
>> repository, one can include the URI pointing to a .dockercfg that contains
>> login information." and I would have read it a dozen times!
>> But I never thought that they literally meant '.dockercfg' as the name of
>> the file!
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com> wrote:
>>
>>>
>>> I have got the docker config file copied into the sandbox using the
>>> thermos_executor_resources flag; however docker is still not able to find
>>> the credentials file for doing an appropriate pull of image from a private
>>> repo.
>>>
>>> When I try to use the library/hello-world:latest image from public
>>> docker repo to check if everything works fine without the credentials, I
>>> encounter a different problem:
>>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>>> Error response from daemon: Cannot start container
>>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>>
>>> I was referring to this email for guidance on setting up a mesos slave:
>>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>>
>>> So, I cannot get the credentials file to be used by docker, and if I
>>> bypass authentication, I can do a docker pull, but encounter a weird error
>>> in launching the hello-world image.
>>>
>>> Am I missing out on checking any log files generated? I currently refer
>>> to mesos-slave stdout and the sandbox stderr file.
>>> Any configuration parameter I am missing for this to happen?
>>>
>>> Any pointers will be really helpful. Thanks in advance.
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>>> list:
>>>> MESOS-4242 - Allow Docker private registry credentials to be passed
>>>> from framework.
>>>> How does one pass credentials using the framework? As it seems the
>>>> .docker/config.json is not read from the slave.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> I couldn't complete my PoC before project before (got busy with other
>>>>> work). Well, it is never too late and here's my update and issue.
>>>>>
>>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>>> (v0.11.0) running.
>>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0
>>>>> & got a protobuf field not set error - ExecutorInfo field.
>>>>>
>>>>> I have a mesos agent running in docker container on coreos and it can
>>>>> access the host docker just fine.
>>>>> I have also put the docker login credentials file at the right
>>>>> location for it to access the private docker registry.
>>>>> I can manually trigger a docker pull and docker run without issues
>>>>> from the slave (which is also reflected properly outside the slave
>>>>> container with docker images and docker ps).
>>>>>
>>>>> However, when I try to run an aurora job with hello-docker container,
>>>>> the slave prints out the log that docker pull has failed; more specifically:
>>>>> " failed to start: Failed to 'docker pull
>>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>>
>>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>>> the exact same error when I delete the credentials file from the slave and
>>>>> trigger a pull.
>>>>>
>>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>>> source it some way before the run command?
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>>>> instructions for what clusters are available and how to discover them.
>>>>>>
>>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>>> framework at a time, this signals which one is active.
>>>>>>
>>>>>> (3) The observer is essentially a web server that allows you to
>>>>>> browse a task's sandbox directory and other information about it.  You will
>>>>>> need to configure it to run on your worker/agent nodes for that
>>>>>> functionality to work (it's linked from the scheduler web UI).
>>>>>>
>>>>>> (4) You could indeed implement that behavior externally.  There is a
>>>>>> reason:
>>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>>
>>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>>> description format that is shipped over the API.  There's not good
>>>>>> documentation on this, but we can help you through it and would be grateful
>>>>>> for a writeup of your approach!
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Folks,
>>>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>>> auto-scaling cluster.
>>>>>>> I have some further questions about the work done so far & things I
>>>>>>> plan to do:
>>>>>>>
>>>>>>>    1. Is the /etc/aurora/clusters.json file created by the
>>>>>>>    scheduled or does it need to be handcrafted? I had to manually edit the
>>>>>>>    file to get my `aurora job ...` cli to work.
>>>>>>>
>>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>>    that they are working as a cluster?
>>>>>>>
>>>>>>>    3. From the documentation, I see that there is an observer that
>>>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>>    (libprocess).
>>>>>>>
>>>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>>
>>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>>    Active tasks (1):
>>>>>>>           Task role: ubuntu, env: test, name: hello_world,
>>>>>>>    instance: 0, status:
>>>>>>>    PENDING on None
>>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>>              events:
>>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>>    Inactive tasks (0):
>>>>>>>
>>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide
>>>>>>>    to increase/decrease the number of instances in my cluster, then I need to
>>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>>    ...` command. Is this right?
>>>>>>>    If yes, is there an HTTP API I can invoke remotely which
>>>>>>>    triggers this update?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>>> jcohen@twopensource.com> wrote:
>>>>>>>
>>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>>> .aurora config you're using?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>> Joshua
>>>>>>>>
>>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks, Zameer.
>>>>>>>>>
>>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>>> [
>>>>>>>>>   {
>>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>>     "name": "testcluster",
>>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>>   }
>>>>>>>>> ]
>>>>>>>>>
>>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>>> following command errors out:
>>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>>> ./hello_world.aurora
>>>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>>>> '/vagrant/hello_world.py'
>>>>>>>>>
>>>>>>>>> A job list does work:
>>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>>
>>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>>
>>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>>> Mesos' task reconciliation
>>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>>> instead.
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>>>> run aurora. :)
>>>>>>>>>>>
>>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>>> system.
>>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>>
>>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>>> ./root/.pex
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <
>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>>
>>>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>>
>>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Stephen,
>>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not
>>>>>>>>>>>>> present in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>>>> I think at the end of this, I will put the steps I followed in
>>>>>>>>>>>>> a gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>>>> the thermos_executor_path command line flag of the
>>>>>>>>>>>>>> scheduler. It is supposed to point to the binary containing the generic
>>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>>> box).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler
>>>>>>>>>>>>>> CLI.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>>>> Time
>>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend on aurora
>>>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside
>>>>>>>>>>>>>>>> a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options
>>>>>>>>>>>>>>>>> that have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at
>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Zameer Manji
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

-- 

Thumb typed mail

Re: Stacktrace when running Apache Aurora

Posted by Bill Farner <wf...@apache.org>.

Wow!  I'm glad you got it working!  To help the next poor soul trying to do
this, would you be willing to put up a doc patch on our end?

On Thursday, March 3, 2016, Krish <kr...@gmail.com> wrote:

> TLDR;
> Use only file with the name .dockercfg for docker credentials in mesos
> tasks!
>
> Long story:
> ---------------
> Holy smokescreens!
> This is for reporting & documenting purposes only, so that others don't
> have to pull their hair like I did for the past few evenings!
>
> A little background:
> I am running Ubuntu 14.04 on my system and docker stores its credentials
> in the ~/.docker/config.json as
> cat ~/.docker/config.json
> {
> "auths": {
> "repo.example.com:5000": {
> "auth": "<snip>",
> "email": "<snip>"
> }
> }
> }
>
> And I am doing all these experiments on a coreOS system which stores the
> credentials  in ~/.dockercfg as
> core@aurora-1 ~ $ cat ~/.dockercfg
> {
>   "repo.example.com:5000": {
>     "auth": "<snip>",
>     "email": "<snip>"
>   }
> }
>
> Since my container was an Ubuntu 14.04 container (as was my local system),
> I used the ubuntu credential file format, i.e. I couldn't get the slave
> task to read the docker credentials as I had stored it as
> ~/.docker/config.json.
> After parsing through (a lot of find's, grep's and regex matching) aurora,
> mesos, and thermos source code, I saw in mesos/src/docker/docker.cpp:
>
> 1126   // Set HOME variable to pick up *.dockercfg*.
> 1127   map<string, string> environment = os::environment();
> 1128
> 1129   environment["HOME"] = directory;
> 1130
>
> Changed the filename and the json content, changed the
> thermos_executor_resources, and bam, docker pull works!
>
> Well, the mesos documentation does say "To run an image from a private
> repository, one can include the URI pointing to a .dockercfg that contains
> login information." and I would have read it a dozen times!
> But I never thought that they literally meant '.dockercfg' as the name of
> the file!
>
>
>
>
> --
> κρισhναν
>
> On Thu, Mar 3, 2016 at 1:45 PM, Krish <krishnan.k.iyer@gmail.com
> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>
>>
>> I have got the docker config file copied into the sandbox using the
>> thermos_executor_resources flag; however docker is still not able to find
>> the credentials file for doing an appropriate pull of image from a private
>> repo.
>>
>> When I try to use the library/hello-world:latest image from public docker
>> repo to check if everything works fine without the credentials, I encounter
>> a different problem:
>> exec: "/bin/sh": stat /bin/sh: no such file or directory
>> Error response from daemon: Cannot start container
>> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
>> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>>
>> I was referring to this email for guidance on setting up a mesos slave:
>> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>>
>> So, I cannot get the credentials file to be used by docker, and if I
>> bypass authentication, I can do a docker pull, but encounter a weird error
>> in launching the hello-world image.
>>
>> Am I missing out on checking any log files generated? I currently refer
>> to mesos-slave stdout and the sandbox stderr file.
>> Any configuration parameter I am missing for this to happen?
>>
>> Any pointers will be really helpful. Thanks in advance.
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Sun, Feb 28, 2016 at 3:37 PM, Krish <krishnan.k.iyer@gmail.com
>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>
>>> Continuing my earlier chain of thought, I found this in the mesos bug
>>> list:
>>> MESOS-4242 - Allow Docker private registry credentials to be passed from
>>> framework.
>>> How does one pass credentials using the framework? As it seems the
>>> .docker/config.json is not read from the slave.
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <krishnan.k.iyer@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>>
>>>> I couldn't complete my PoC before project before (got busy with other
>>>> work). Well, it is never too late and here's my update and issue.
>>>>
>>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>>> (v0.11.0) running.
>>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0
>>>> & got a protobuf field not set error - ExecutorInfo field.
>>>>
>>>> I have a mesos agent running in docker container on coreos and it can
>>>> access the host docker just fine.
>>>> I have also put the docker login credentials file at the right location
>>>> for it to access the private docker registry.
>>>> I can manually trigger a docker pull and docker run without issues from
>>>> the slave (which is also reflected properly outside the slave container
>>>> with docker images and docker ps).
>>>>
>>>> However, when I try to run an aurora job with hello-docker container,
>>>> the slave prints out the log that docker pull has failed; more specifically:
>>>> " failed to start: Failed to 'docker pull
>>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>>> status 1 stderr = Error: image krish/test:latest not found"
>>>>
>>>> My hunch is that when using docker run from aurora DSL, it does not
>>>> read the docker credentials file properly and hence fails. I can reproduce
>>>> the exact same error when I delete the credentials file from the slave and
>>>> trigger a pull.
>>>>
>>>> Is the hunch right? If yes, is there a way to resolve this? Maybe
>>>> source it some way before the run command?
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wfarner@apache.org
>>>> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>>>>
>>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>>> instructions for what clusters are available and how to discover them.
>>>>>
>>>>> (2) That's expected - mesos only allows one active replica of a
>>>>> framework at a time, this signals which one is active.
>>>>>
>>>>> (3) The observer is essentially a web server that allows you to browse
>>>>> a task's sandbox directory and other information about it.  You will need
>>>>> to configure it to run on your worker/agent nodes for that functionality to
>>>>> work (it's linked from the scheduler web UI).
>>>>>
>>>>> (4) You could indeed implement that behavior externally.  There is a
>>>>> reason:
>>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>>
>>>>> (5) That is correct.  The scheduler exposes a thrift API that you
>>>>> would use (a REST API is coming, but ground has not yet been broken).  If
>>>>> you go this route, i suggest you skip the DSL and use the JSON task
>>>>> description format that is shipped over the API.  There's not good
>>>>> documentation on this, but we can help you through it and would be grateful
>>>>> for a writeup of your approach!
>>>>>
>>>>>
>>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <krishnan.k.iyer@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>>>>
>>>>>> Hi Folks,
>>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>>> auto-scaling cluster.
>>>>>> I have some further questions about the work done so far & things I
>>>>>> plan to do:
>>>>>>
>>>>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>>>    or does it need to be handcrafted? I had to manually edit the file to get
>>>>>>    my `aurora job ...` cli to work.
>>>>>>
>>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>>    that they are working as a cluster?
>>>>>>
>>>>>>    3. From the documentation, I see that there is an observer that
>>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>>    (libprocess).
>>>>>>
>>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>>    provide reasons for why is a task in PENDING state?
>>>>>>
>>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>>    Active tasks (1):
>>>>>>           Task role: ubuntu, env: test, name: hello_world, instance:
>>>>>>    0, status:
>>>>>>    PENDING on None
>>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>>              events:
>>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>>    Inactive tasks (0):
>>>>>>
>>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>>>    increase/decrease the number of instances in my cluster, then I need to
>>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>>    ...` command. Is this right?
>>>>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>>>>    this update?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <
>>>>>> jcohen@twopensource.com
>>>>>> <javascript:_e(%7B%7D,'cvml','jcohen@twopensource.com');>> wrote:
>>>>>>
>>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>>> .aurora config you're using?
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Joshua
>>>>>>>
>>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <krishnan.k.iyer@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>> wrote:
>>>>>>>
>>>>>>>> Thanks, Zameer.
>>>>>>>>
>>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>>> [
>>>>>>>>   {
>>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>>     "name": "testcluster",
>>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>>     "slave_run_directory": "latest",
>>>>>>>>     "zk": "127.0.1.1"
>>>>>>>>   }
>>>>>>>> ]
>>>>>>>>
>>>>>>>> I have a hello_world.aurora in my home folder. However the
>>>>>>>> following command errors out:
>>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>>> ./hello_world.aurora
>>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>>> '/vagrant/hello_world.py'
>>>>>>>>
>>>>>>>> A job list does work:
>>>>>>>> ~$ aurora job list testcluster
>>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>>
>>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>>
>>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zmanji@apache.org
>>>>>>>> <javascript:_e(%7B%7D,'cvml','zmanji@apache.org');>> wrote:
>>>>>>>>
>>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>>> Mesos' task reconciliation
>>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>>> instead.
>>>>>>>>>
>>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>>> run aurora. :)
>>>>>>>>>>
>>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>>> system.
>>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>>
>>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>>> ./root/.pex
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wfarner@apache.org
>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Aurora currently requires an executor, so setting it to
>>>>>>>>>>> /dev/null will not work.  Happy to talk further about your thoughts around
>>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>>
>>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>>
>>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com
>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Stephen,
>>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you please point me to the right documentation (or a
>>>>>>>>>>>> pointer to the cli parsing source code) which can help me resolve this?
>>>>>>>>>>>> Also, are there any steps steps to import source code into eclipse to
>>>>>>>>>>>> browse & analyze code for this.
>>>>>>>>>>>>
>>>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>>>
>>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>>> I think at the end of this, I will put the steps I followed in
>>>>>>>>>>>> a gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>>> Stephan.Erb@blue-yonder.com
>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','Stephan.Erb@blue-yonder.com');>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant
>>>>>>>>>>>>> up`in a checkout of the Aurora source code. It gives you a running
>>>>>>>>>>>>> scheduler to play with. Once you have understood how it works, you can
>>>>>>>>>>>>> start trying to install it on your own (by reverse-engineering the vagrant
>>>>>>>>>>>>> box).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>> *From:* Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>>> *Cc:* user@aurora.apache.org
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>;
>>>>>>>>>>>>> Erb, Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>>> Time
>>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>>> null
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>>   while locating
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1 error
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>>> wfarner@apache.org
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','wfarner@apache.org');>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>>> krishnan.k.iyer@gmail.com
>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend on aurora
>>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside
>>>>>>>>>>>>>>> a docker container with aurora & wait for a 'resource not available'
>>>>>>>>>>>>>>> message from mesos, and accordingly call an api to spin up a new node in my
>>>>>>>>>>>>>>> cluster.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com
>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','Stephan.Erb@blue-yonder.com');>
>>>>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>> *From:* Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','krishnan.k.iyer@gmail.com');>
>>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','user@aurora.apache.org');>
>>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8].
>>>>>>>>>>>>>>>> This could indicate a bug.  The method
>>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may
>>>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Zameer Manji
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

TLDR;
Use only file with the name .dockercfg for docker credentials in mesos
tasks!

Long story:
---------------
Holy smokescreens!
This is for reporting & documenting purposes only, so that others don't
have to pull their hair like I did for the past few evenings!

A little background:
I am running Ubuntu 14.04 on my system and docker stores its credentials in
the ~/.docker/config.json as
cat ~/.docker/config.json
{
"auths": {
"repo.example.com:5000": {
"auth": "<snip>",
"email": "<snip>"
}
}
}

And I am doing all these experiments on a coreOS system which stores the
credentials  in ~/.dockercfg as
core@aurora-1 ~ $ cat ~/.dockercfg
{
  "repo.example.com:5000": {
    "auth": "<snip>",
    "email": "<snip>"
  }
}

Since my container was an Ubuntu 14.04 container (as was my local system),
I used the ubuntu credential file format, i.e. I couldn't get the slave
task to read the docker credentials as I had stored it as
~/.docker/config.json.
After parsing through (a lot of find's, grep's and regex matching) aurora,
mesos, and thermos source code, I saw in mesos/src/docker/docker.cpp:

1126   // Set HOME variable to pick up *.dockercfg*.
1127   map<string, string> environment = os::environment();
1128
1129   environment["HOME"] = directory;
1130

Changed the filename and the json content, changed the
thermos_executor_resources, and bam, docker pull works!

Well, the mesos documentation does say "To run an image from a private
repository, one can include the URI pointing to a .dockercfg that contains
login information." and I would have read it a dozen times!
But I never thought that they literally meant '.dockercfg' as the name of
the file!




--
κρισhναν

On Thu, Mar 3, 2016 at 1:45 PM, Krish <kr...@gmail.com> wrote:

>
> I have got the docker config file copied into the sandbox using the
> thermos_executor_resources flag; however docker is still not able to find
> the credentials file for doing an appropriate pull of image from a private
> repo.
>
> When I try to use the library/hello-world:latest image from public docker
> repo to check if everything works fine without the credentials, I encounter
> a different problem:
> exec: "/bin/sh": stat /bin/sh: no such file or directory
> Error response from daemon: Cannot start container
> de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
> System error: exec: "/bin/sh": stat /bin/sh: no such file or directory
>
> I was referring to this email for guidance on setting up a mesos slave:
> http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E
>
> So, I cannot get the credentials file to be used by docker, and if I
> bypass authentication, I can do a docker pull, but encounter a weird error
> in launching the hello-world image.
>
> Am I missing out on checking any log files generated? I currently refer to
> mesos-slave stdout and the sandbox stderr file.
> Any configuration parameter I am missing for this to happen?
>
> Any pointers will be really helpful. Thanks in advance.
>
>
>
> --
> κρισhναν
>
> On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com> wrote:
>
>> Continuing my earlier chain of thought, I found this in the mesos bug
>> list:
>> MESOS-4242 - Allow Docker private registry credentials to be passed from
>> framework.
>> How does one pass credentials using the framework? As it seems the
>> .docker/config.json is not read from the slave.
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com>
>> wrote:
>>
>>> I couldn't complete my PoC before project before (got busy with other
>>> work). Well, it is never too late and here's my update and issue.
>>>
>>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>>> (v0.11.0) running.
>>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
>>> got a protobuf field not set error - ExecutorInfo field.
>>>
>>> I have a mesos agent running in docker container on coreos and it can
>>> access the host docker just fine.
>>> I have also put the docker login credentials file at the right location
>>> for it to access the private docker registry.
>>> I can manually trigger a docker pull and docker run without issues from
>>> the slave (which is also reflected properly outside the slave container
>>> with docker images and docker ps).
>>>
>>> However, when I try to run an aurora job with hello-docker container,
>>> the slave prints out the log that docker pull has failed; more specifically:
>>> " failed to start: Failed to 'docker pull
>>> private_repo.com:5000/krish/test:latest': exit status = exited with
>>> status 1 stderr = Error: image krish/test:latest not found"
>>>
>>> My hunch is that when using docker run from aurora DSL, it does not read
>>> the docker credentials file properly and hence fails. I can reproduce the
>>> exact same error when I delete the credentials file from the slave and
>>> trigger a pull.
>>>
>>> Is the hunch right? If yes, is there a way to resolve this? Maybe source
>>> it some way before the run command?
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org>
>>> wrote:
>>>
>>>> (1) clusters.json is written by you, configuring the CLI client with
>>>> instructions for what clusters are available and how to discover them.
>>>>
>>>> (2) That's expected - mesos only allows one active replica of a
>>>> framework at a time, this signals which one is active.
>>>>
>>>> (3) The observer is essentially a web server that allows you to browse
>>>> a task's sandbox directory and other information about it.  You will need
>>>> to configure it to run on your worker/agent nodes for that functionality to
>>>> work (it's linked from the scheduler web UI).
>>>>
>>>> (4) You could indeed implement that behavior externally.  There is a
>>>> reason:
>>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>>
>>>> (5) That is correct.  The scheduler exposes a thrift API that you would
>>>> use (a REST API is coming, but ground has not yet been broken).  If you go
>>>> this route, i suggest you skip the DSL and use the JSON task description
>>>> format that is shipped over the API.  There's not good documentation on
>>>> this, but we can help you through it and would be grateful for a writeup of
>>>> your approach!
>>>>
>>>>
>>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Folks,
>>>>> Firstly, thanks for all the help. Am happy to report that I have set
>>>>> up zk, mesos & aurora, & can work further towards my idea of having an
>>>>> auto-scaling cluster.
>>>>> I have some further questions about the work done so far & things I
>>>>> plan to do:
>>>>>
>>>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>>    or does it need to be handcrafted? I had to manually edit the file to get
>>>>>    my `aurora job ...` cli to work.
>>>>>
>>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk,
>>>>>    mesos & aurora in a docker container. Only 1 of them outputs '1' when I
>>>>>    look at the framework_registered' field. Is this expected? How do I verify
>>>>>    that they are working as a cluster?
>>>>>
>>>>>    3. From the documentation, I see that there is an observer that
>>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>>    (libprocess).
>>>>>
>>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>>    provide reasons for why is a task in PENDING state?
>>>>>
>>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>>    Active tasks (1):
>>>>>           Task role: ubuntu, env: test, name: hello_world, instance:
>>>>>    0, status:
>>>>>    PENDING on None
>>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>>              events:
>>>>>               2015-10-23 04:55:33 PENDING: None
>>>>>    Inactive tasks (0):
>>>>>
>>>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>>    increase/decrease the number of instances in my cluster, then I need to
>>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>>    ...` command. Is this right?
>>>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>>>    this update?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jcohen@twopensource.com
>>>>> > wrote:
>>>>>
>>>>>> I suspect your error from `aurora job create ...` is due to the
>>>>>> aurora config you're using referencing `/vagrant/hello_world.py` which does
>>>>>> not exist (as you say: you're not even using Vagrant). Can you link the
>>>>>> .aurora config you're using?
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Joshua
>>>>>>
>>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks, Zameer.
>>>>>>>
>>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>>> [
>>>>>>>   {
>>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>>     "name": "testcluster",
>>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>>     "slave_run_directory": "latest",
>>>>>>>     "zk": "127.0.1.1"
>>>>>>>   }
>>>>>>> ]
>>>>>>>
>>>>>>> I have a hello_world.aurora in my home folder. However the following
>>>>>>> command errors out:
>>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>>> ./hello_world.aurora
>>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>>> '/vagrant/hello_world.py'
>>>>>>>
>>>>>>> A job list does work:
>>>>>>> ~$ aurora job list testcluster
>>>>>>>  INFO] Retrieving jobs for role None
>>>>>>>
>>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>>
>>>>>>> Any pointers to documentation will be helpful.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>>> Mesos' task reconciliation
>>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>>> instead.
>>>>>>>>
>>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Bill for the location to the debs. I was finally able to
>>>>>>>>> run aurora. :)
>>>>>>>>>
>>>>>>>>> I did find thermos_executor.pex & thermos_observer after
>>>>>>>>> installing aurora-executor. I still could not find gc_executor.pex on my
>>>>>>>>> system.
>>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>>> *.pex or build them from scratch?
>>>>>>>>>
>>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>>> ./home/ubuntu/.pex
>>>>>>>>> ./root/.pex
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>>>>> sidestepping the executor.
>>>>>>>>>>
>>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>>
>>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Stephen,
>>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>>
>>>>>>>>>>> Can you please point me to the right documentation (or a pointer
>>>>>>>>>>> to the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>>>>> analyze code for this.
>>>>>>>>>>>
>>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>>
>>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>>> would appreciate the help.
>>>>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in
>>>>>>>>>>>> a checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>>
>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>
>>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>>
>>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>>> required arguments.
>>>>>>>>>>>>
>>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>>> ...
>>>>>>>>>>>> ...
>>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>>> deTimeZone
>>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal
>>>>>>>>>>>> Time
>>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>>> vision errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be null at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>   while locating
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>
>>>>>>>>>>>> 1 error
>>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) Error in custom provider,
>>>>>>>>>>>> java.lang.IllegalArgumentException: Path cannot be
>>>>>>>>>>>> null
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>>   while locating
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>>
>>>>>>>>>>>> 1 error
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <
>>>>>>>>>>>> wfarner@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your
>>>>>>>>>>>>> job to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For (3), you could set up alerting for stats that the
>>>>>>>>>>>>> scheduler exports.  Have a look here for monitoring background:
>>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>>
>>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the
>>>>>>>>>>>>>> aurora-scheduler script has the --thermos_executor_path as a mandatory
>>>>>>>>>>>>>> requirement.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>>> 2. What happens when we want to dynamically change the
>>>>>>>>>>>>>> config, say increasing the number of instances of a service required? Does
>>>>>>>>>>>>>> aurora require a reboot then?
>>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when
>>>>>>>>>>>>>> it cannot schedule tasks for lack of resources? Should I depend on aurora
>>>>>>>>>>>>>> for this or try to look for a hook into mesos?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> See
>>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value
>>>>>>>>>>>>>>> may only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only
>>>>>>>>>>>>>>> be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Zameer Manji
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

I have got the docker config file copied into the sandbox using the
thermos_executor_resources flag; however docker is still not able to find
the credentials file for doing an appropriate pull of image from a private
repo.

When I try to use the library/hello-world:latest image from public docker
repo to check if everything works fine without the credentials, I encounter
a different problem:
exec: "/bin/sh": stat /bin/sh: no such file or directory
Error response from daemon: Cannot start container
de93dc344d44b41bccccff49e508001a97ff23a8964e637d32a506a31fd4d946: [8]
System error: exec: "/bin/sh": stat /bin/sh: no such file or directory

I was referring to this email for guidance on setting up a mesos slave:
http://mail-archives.apache.org/mod_mbox/aurora-dev/201503.mbox/%3CCAKB1MkHR=+NOnesLLX9bUWtTDThsku46PW_wR4B+_Z9P59+XWQ@mail.gmail.com%3E

So, I cannot get the credentials file to be used by docker, and if I bypass
authentication, I can do a docker pull, but encounter a weird error in
launching the hello-world image.

Am I missing out on checking any log files generated? I currently refer to
mesos-slave stdout and the sandbox stderr file.
Any configuration parameter I am missing for this to happen?

Any pointers will be really helpful. Thanks in advance.



--
κρισhναν

On Sun, Feb 28, 2016 at 3:37 PM, Krish <kr...@gmail.com> wrote:

> Continuing my earlier chain of thought, I found this in the mesos bug list:
> MESOS-4242 - Allow Docker private registry credentials to be passed from
> framework.
> How does one pass credentials using the framework? As it seems the
> .docker/config.json is not read from the slave.
>
>
>
>
> --
> κρισhναν
>
> On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com> wrote:
>
>> I couldn't complete my PoC before project before (got busy with other
>> work). Well, it is never too late and here's my update and issue.
>>
>> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
>> (v0.11.0) running.
>> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
>> got a protobuf field not set error - ExecutorInfo field.
>>
>> I have a mesos agent running in docker container on coreos and it can
>> access the host docker just fine.
>> I have also put the docker login credentials file at the right location
>> for it to access the private docker registry.
>> I can manually trigger a docker pull and docker run without issues from
>> the slave (which is also reflected properly outside the slave container
>> with docker images and docker ps).
>>
>> However, when I try to run an aurora job with hello-docker container, the
>> slave prints out the log that docker pull has failed; more specifically:
>> " failed to start: Failed to 'docker pull
>> private_repo.com:5000/krish/test:latest': exit status = exited with
>> status 1 stderr = Error: image krish/test:latest not found"
>>
>> My hunch is that when using docker run from aurora DSL, it does not read
>> the docker credentials file properly and hence fails. I can reproduce the
>> exact same error when I delete the credentials file from the slave and
>> trigger a pull.
>>
>> Is the hunch right? If yes, is there a way to resolve this? Maybe source
>> it some way before the run command?
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org> wrote:
>>
>>> (1) clusters.json is written by you, configuring the CLI client with
>>> instructions for what clusters are available and how to discover them.
>>>
>>> (2) That's expected - mesos only allows one active replica of a
>>> framework at a time, this signals which one is active.
>>>
>>> (3) The observer is essentially a web server that allows you to browse a
>>> task's sandbox directory and other information about it.  You will need to
>>> configure it to run on your worker/agent nodes for that functionality to
>>> work (it's linked from the scheduler web UI).
>>>
>>> (4) You could indeed implement that behavior externally.  There is a
>>> reason:
>>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>>
>>> (5) That is correct.  The scheduler exposes a thrift API that you would
>>> use (a REST API is coming, but ground has not yet been broken).  If you go
>>> this route, i suggest you skip the DSL and use the JSON task description
>>> format that is shipped over the API.  There's not good documentation on
>>> this, but we can help you through it and would be grateful for a writeup of
>>> your approach!
>>>
>>>
>>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Hi Folks,
>>>> Firstly, thanks for all the help. Am happy to report that I have set up
>>>> zk, mesos & aurora, & can work further towards my idea of having an
>>>> auto-scaling cluster.
>>>> I have some further questions about the work done so far & things I
>>>> plan to do:
>>>>
>>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled
>>>>    or does it need to be handcrafted? I had to manually edit the file to get
>>>>    my `aurora job ...` cli to work.
>>>>
>>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos
>>>>    & aurora in a docker container. Only 1 of them outputs '1' when I look at
>>>>    the framework_registered' field. Is this expected? How do I verify that
>>>>    they are working as a cluster?
>>>>
>>>>    3. From the documentation, I see that there is an observer that
>>>>    needs to be listening on port 1338. What is the observer socket & its
>>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>>    (libprocess).
>>>>
>>>>    4. I read about the 'PENDING' field in aurora documentation, as
>>>>    Bill suggested, & realize that it just shows that a task is waiting for
>>>>    some reasons (for want of resources, in my case, as 0 slaves have
>>>>    registered). I was thinking of adding a hook to the pending state; say if a
>>>>    task is PENDING for 5 minutes for lack of resources in the cluster, then
>>>>    spin up a new machine. Is this the right approach to take? Does aurora
>>>>    provide reasons for why is a task in PENDING state?
>>>>
>>>>    => aurora job status testcluster/$USER/test/hello_world
>>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>>    Active tasks (1):
>>>>           Task role: ubuntu, env: test, name: hello_world, instance:
>>>>    0, status:
>>>>    PENDING on None
>>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>>              events:
>>>>               2015-10-23 04:55:33 PENDING: None
>>>>    Inactive tasks (0):
>>>>
>>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>>    increase/decrease the number of instances in my cluster, then I need to
>>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>>    ...` command. Is this right?
>>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>>    this update?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
>>>> wrote:
>>>>
>>>>> I suspect your error from `aurora job create ...` is due to the aurora
>>>>> config you're using referencing `/vagrant/hello_world.py` which does not
>>>>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>>>>> config you're using?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Joshua
>>>>>
>>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks, Zameer.
>>>>>>
>>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>>> [
>>>>>>   {
>>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>>     "name": "testcluster",
>>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>>     "slave_root": "/var/lib/mesos",
>>>>>>     "slave_run_directory": "latest",
>>>>>>     "zk": "127.0.1.1"
>>>>>>   }
>>>>>> ]
>>>>>>
>>>>>> I have a hello_world.aurora in my home folder. However the following
>>>>>> command errors out:
>>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>>> ./hello_world.aurora
>>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>>> '/vagrant/hello_world.py'
>>>>>>
>>>>>> A job list does work:
>>>>>> ~$ aurora job list testcluster
>>>>>>  INFO] Retrieving jobs for role None
>>>>>>
>>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>>
>>>>>> Any pointers to documentation will be helpful.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>>> Mesos' task reconciliation
>>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>>> instead.
>>>>>>>
>>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>>>>> aurora. :)
>>>>>>>>
>>>>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>>>>> Is there a location from where I can download the binaries for
>>>>>>>> *.pex or build them from scratch?
>>>>>>>>
>>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>>> ./home/ubuntu/.pex
>>>>>>>> ./root/.pex
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>>>> sidestepping the executor.
>>>>>>>>>
>>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>>
>>>>>>>>> As for builds - the .zip is a source distribution, so it won't
>>>>>>>>> have any pre-built binaries.  If you're on debian, we have official debs
>>>>>>>>> here: https://bintray.com/apache/aurora
>>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Stephen,
>>>>>>>>>> I am trying to get started and run aurora without thermos
>>>>>>>>>> executor (setting it to /dev/null does not help) - on a local linux box for
>>>>>>>>>> now & planning to containerize/dockerize it later.
>>>>>>>>>>
>>>>>>>>>> Can you please point me to the right documentation (or a pointer
>>>>>>>>>> to the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>>>> analyze code for this.
>>>>>>>>>>
>>>>>>>>>> Also, where do i find all the *.pex files? They are not present
>>>>>>>>>> in the zip file nor anywhere in the built source code.
>>>>>>>>>>
>>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>>> would appreciate the help.
>>>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Krish,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> In addition, looks like you are misunderstanding the usage of
>>>>>>>>>>> the thermos_executor_path command line flag of the scheduler.
>>>>>>>>>>> It is supposed to point to the binary containing the generic
>>>>>>>>>>> Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>>> on an Aurora master.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in
>>>>>>>>>>> a checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>>
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>>
>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>
>>>>>>>>>>> Bill/Stephen,
>>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>>
>>>>>>>>>>> I do not know what to specify for
>>>>>>>>>>>  -framework_authentication_file & -zk_digest_credentials, and they are
>>>>>>>>>>> required arguments.
>>>>>>>>>>>
>>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>>> ...
>>>>>>>>>>> ...
>>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>>> deTimeZone
>>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>>> timezone Greenwich M
>>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>>> vision errors:
>>>>>>>>>>>
>>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>>> Path cannot be null at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>   while locating
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>
>>>>>>>>>>> 1 error
>>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>>
>>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>>> Path cannot be
>>>>>>>>>>> null
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>>   while locating
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>>
>>>>>>>>>>> 1 error
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wfarner@apache.org
>>>>>>>>>>> > wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The typical flow is that you keep your .aurora file checked
>>>>>>>>>>>> into git, and commit every time you deploy/update.  When you change your
>>>>>>>>>>>> file, you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job
>>>>>>>>>>>> to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>>
>>>>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>>
>>>>>>>>>>>> You'll find want to look at scheduler stats related to
>>>>>>>>>>>> 'pending'.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>>> 2. What happens when we want to dynamically change the config,
>>>>>>>>>>>>> say increasing the number of instances of a service required? Does aurora
>>>>>>>>>>>>> require a reboot then?
>>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> See
>>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>>> set.
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>>   at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only
>>>>>>>>>>>>>> be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>>         at
>>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Zameer Manji
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Continuing my earlier chain of thought, I found this in the mesos bug list:
MESOS-4242 - Allow Docker private registry credentials to be passed from
framework.
How does one pass credentials using the framework? As it seems the
.docker/config.json is not read from the slave.




--
κρισhναν

On Sat, Feb 27, 2016 at 11:46 PM, Krish <kr...@gmail.com> wrote:

> I couldn't complete my PoC before project before (got busy with other
> work). Well, it is never too late and here's my update and issue.
>
> I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora
> (v0.11.0) running.
> I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
> got a protobuf field not set error - ExecutorInfo field.
>
> I have a mesos agent running in docker container on coreos and it can
> access the host docker just fine.
> I have also put the docker login credentials file at the right location
> for it to access the private docker registry.
> I can manually trigger a docker pull and docker run without issues from
> the slave (which is also reflected properly outside the slave container
> with docker images and docker ps).
>
> However, when I try to run an aurora job with hello-docker container, the
> slave prints out the log that docker pull has failed; more specifically:
> " failed to start: Failed to 'docker pull
> private_repo.com:5000/krish/test:latest': exit status = exited with
> status 1 stderr = Error: image krish/test:latest not found"
>
> My hunch is that when using docker run from aurora DSL, it does not read
> the docker credentials file properly and hence fails. I can reproduce the
> exact same error when I delete the credentials file from the slave and
> trigger a pull.
>
> Is the hunch right? If yes, is there a way to resolve this? Maybe source
> it some way before the run command?
>
>
>
> --
> κρισhναν
>
> On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org> wrote:
>
>> (1) clusters.json is written by you, configuring the CLI client with
>> instructions for what clusters are available and how to discover them.
>>
>> (2) That's expected - mesos only allows one active replica of a framework
>> at a time, this signals which one is active.
>>
>> (3) The observer is essentially a web server that allows you to browse a
>> task's sandbox directory and other information about it.  You will need to
>> configure it to run on your worker/agent nodes for that functionality to
>> work (it's linked from the scheduler web UI).
>>
>> (4) You could indeed implement that behavior externally.  There is a
>> reason:
>> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>>
>> (5) That is correct.  The scheduler exposes a thrift API that you would
>> use (a REST API is coming, but ground has not yet been broken).  If you go
>> this route, i suggest you skip the DSL and use the JSON task description
>> format that is shipped over the API.  There's not good documentation on
>> this, but we can help you through it and would be grateful for a writeup of
>> your approach!
>>
>>
>> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com>
>> wrote:
>>
>>> Hi Folks,
>>> Firstly, thanks for all the help. Am happy to report that I have set up
>>> zk, mesos & aurora, & can work further towards my idea of having an
>>> auto-scaling cluster.
>>> I have some further questions about the work done so far & things I plan
>>> to do:
>>>
>>>    1. Is the /etc/aurora/clusters.json file created by the scheduled or
>>>    does it need to be handcrafted? I had to manually edit the file to get my
>>>    `aurora job ...` cli to work.
>>>
>>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos
>>>    & aurora in a docker container. Only 1 of them outputs '1' when I look at
>>>    the framework_registered' field. Is this expected? How do I verify that
>>>    they are working as a cluster?
>>>
>>>    3. From the documentation, I see that there is an observer that
>>>    needs to be listening on port 1338. What is the observer socket & its
>>>    purpose? I have aurora listening only on ports 8081 (http port) & 8083
>>>    (libprocess).
>>>
>>>    4. I read about the 'PENDING' field in aurora documentation, as Bill
>>>    suggested, & realize that it just shows that a task is waiting for some
>>>    reasons (for want of resources, in my case, as 0 slaves have registered). I
>>>    was thinking of adding a hook to the pending state; say if a task is
>>>    PENDING for 5 minutes for lack of resources in the cluster, then spin up a
>>>    new machine. Is this the right approach to take? Does aurora provide
>>>    reasons for why is a task in PENDING state?
>>>
>>>    => aurora job status testcluster/$USER/test/hello_world
>>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>>    Active tasks (1):
>>>           Task role: ubuntu, env: test, name: hello_world, instance: 0,
>>>    status:
>>>    PENDING on None
>>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>>              events:
>>>               2015-10-23 04:55:33 PENDING: None
>>>    Inactive tasks (0):
>>>
>>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>>    increase/decrease the number of instances in my cluster, then I need to
>>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>>    ...` command. Is this right?
>>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>>    this update?
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
>>> wrote:
>>>
>>>> I suspect your error from `aurora job create ...` is due to the aurora
>>>> config you're using referencing `/vagrant/hello_world.py` which does not
>>>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>>>> config you're using?
>>>>
>>>> Cheers,
>>>>
>>>> Joshua
>>>>
>>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks, Zameer.
>>>>>
>>>>> I had to modify  /etc/aurora/clusters.json:
>>>>> [
>>>>>   {
>>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>>     "name": "testcluster",
>>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>>     "slave_root": "/var/lib/mesos",
>>>>>     "slave_run_directory": "latest",
>>>>>     "zk": "127.0.1.1"
>>>>>   }
>>>>> ]
>>>>>
>>>>> I have a hello_world.aurora in my home folder. However the following
>>>>> command errors out:
>>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>>> ./hello_world.aurora
>>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>>> '/vagrant/hello_world.py'
>>>>>
>>>>> A job list does work:
>>>>> ~$ aurora job list testcluster
>>>>>  INFO] Retrieving jobs for role None
>>>>>
>>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>>
>>>>> Any pointers to documentation will be helpful.
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses
>>>>>> Mesos' task reconciliation
>>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>>> instead.
>>>>>>
>>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>>>> aurora. :)
>>>>>>>
>>>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>>>> Is there a location from where I can download the binaries for *.pex
>>>>>>> or build them from scratch?
>>>>>>>
>>>>>>> root@dev:/# find . -name "*.pex"
>>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>>> ./home/ubuntu/.pex
>>>>>>> ./root/.pex
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>>> sidestepping the executor.
>>>>>>>>
>>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>>
>>>>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>>>>> any pre-built binaries.  If you're on debian, we have official debs here:
>>>>>>>> https://bintray.com/apache/aurora
>>>>>>>> You can see how they're built here (and can build your own)
>>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Stephen,
>>>>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>>>>> planning to containerize/dockerize it later.
>>>>>>>>>
>>>>>>>>> Can you please point me to the right documentation (or a pointer
>>>>>>>>> to the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>>> analyze code for this.
>>>>>>>>>
>>>>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>>>>> the zip file nor anywhere in the built source code.
>>>>>>>>>
>>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>>> would appreciate the help.
>>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Krish,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>>> on an Aurora master.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this helps a little,
>>>>>>>>>>
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>>> *To:* Bill Farner
>>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>>
>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>
>>>>>>>>>> Bill/Stephen,
>>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>>
>>>>>>>>>> I do not know what to specify for  -framework_authentication_file
>>>>>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>>>>>
>>>>>>>>>> I am not using any authentication on Mesos master, do I still
>>>>>>>>>> need the framework_authentication_file parameter?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>>> -native_log_file_path=/db
>>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>>> ...
>>>>>>>>>> ...
>>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>>> GuiceManagedCompon
>>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>>> deTimeZone
>>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>>> timezone Greenwich M
>>>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>>>> Oct 20, 2015 9:27:41 AM
>>>>>>>>>> org.eclipse.jetty.server.AbstractConnector doStart
>>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>>> vision errors:
>>>>>>>>>>
>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>> Path cannot be null at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>   while locating
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>
>>>>>>>>>> 1 error
>>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>>
>>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>>> Path cannot be
>>>>>>>>>> null
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>>   while locating
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>>
>>>>>>>>>> 1 error
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job
>>>>>>>>>>> to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>>
>>>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>>
>>>>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>>>
>>>>>>>>>>>> I have a couple of questions on how the
>>>>>>>>>>>> thermos_executor/.aurora config file functions:
>>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>>> 2. What happens when we want to dynamically change the config,
>>>>>>>>>>>> say increasing the number of instances of a service required? Does aurora
>>>>>>>>>>>> require a reboot then?
>>>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>>>
>>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> See
>>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>>> for an example
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>>> Exception in thread "main"
>>>>>>>>>>>>> com.google.inject.CreationException: Guice creation errors:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>>>> set.
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>>> either one (a
>>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>>   at
>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only
>>>>>>>>>>>>> be retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>>
>>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Zameer Manji
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

I couldn't complete my PoC before project before (got busy with other
work). Well, it is never too late and here's my update and issue.

I have a 3 node zk (3.5.1 alpha), mesos-master (v0.24.1) & aurora (v0.11.0)
running.
I was stuck in a problem where I was using mesos 0.25.0 & aurora 0.9.0 &
got a protobuf field not set error - ExecutorInfo field.

I have a mesos agent running in docker container on coreos and it can
access the host docker just fine.
I have also put the docker login credentials file at the right location for
it to access the private docker registry.
I can manually trigger a docker pull and docker run without issues from the
slave (which is also reflected properly outside the slave container with
docker images and docker ps).

However, when I try to run an aurora job with hello-docker container, the
slave prints out the log that docker pull has failed; more specifically:
" failed to start: Failed to 'docker pull
private_repo.com:5000/krish/test:latest': exit status = exited with status
1 stderr = Error: image krish/test:latest not found"

My hunch is that when using docker run from aurora DSL, it does not read
the docker credentials file properly and hence fails. I can reproduce the
exact same error when I delete the credentials file from the slave and
trigger a pull.

Is the hunch right? If yes, is there a way to resolve this? Maybe source it
some way before the run command?



--
κρισhναν

On Tue, Oct 27, 2015 at 10:35 PM, Bill Farner <wf...@apache.org> wrote:

> (1) clusters.json is written by you, configuring the CLI client with
> instructions for what clusters are available and how to discover them.
>
> (2) That's expected - mesos only allows one active replica of a framework
> at a time, this signals which one is active.
>
> (3) The observer is essentially a web server that allows you to browse a
> task's sandbox directory and other information about it.  You will need to
> configure it to run on your worker/agent nodes for that functionality to
> work (it's linked from the scheduler web UI).
>
> (4) You could indeed implement that behavior externally.  There is a
> reason:
> https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559
>
> (5) That is correct.  The scheduler exposes a thrift API that you would
> use (a REST API is coming, but ground has not yet been broken).  If you go
> this route, i suggest you skip the DSL and use the JSON task description
> format that is shipped over the API.  There's not good documentation on
> this, but we can help you through it and would be grateful for a writeup of
> your approach!
>
>
> On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com> wrote:
>
>> Hi Folks,
>> Firstly, thanks for all the help. Am happy to report that I have set up
>> zk, mesos & aurora, & can work further towards my idea of having an
>> auto-scaling cluster.
>> I have some further questions about the work done so far & things I plan
>> to do:
>>
>>    1. Is the /etc/aurora/clusters.json file created by the scheduled or
>>    does it need to be handcrafted? I had to manually edit the file to get my
>>    `aurora job ...` cli to work.
>>
>>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos &
>>    aurora in a docker container. Only 1 of them outputs '1' when I look at the
>>    framework_registered' field. Is this expected? How do I verify that they
>>    are working as a cluster?
>>
>>    3. From the documentation, I see that there is an observer that needs
>>    to be listening on port 1338. What is the observer socket & its purpose? I
>>    have aurora listening only on ports 8081 (http port) & 8083 (libprocess).
>>
>>    4. I read about the 'PENDING' field in aurora documentation, as Bill
>>    suggested, & realize that it just shows that a task is waiting for some
>>    reasons (for want of resources, in my case, as 0 slaves have registered). I
>>    was thinking of adding a hook to the pending state; say if a task is
>>    PENDING for 5 minutes for lack of resources in the cluster, then spin up a
>>    new machine. Is this the right approach to take? Does aurora provide
>>    reasons for why is a task in PENDING state?
>>
>>    => aurora job status testcluster/$USER/test/hello_world
>>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>>    Active tasks (1):
>>           Task role: ubuntu, env: test, name: hello_world, instance: 0,
>>    status:
>>    PENDING on None
>>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>>              events:
>>               2015-10-23 04:55:33 PENDING: None
>>    Inactive tasks (0):
>>
>>    5. Aurora defines job/s is a .aurora config file & if I decide to
>>    increase/decrease the number of instances in my cluster, then I need to
>>    create/overwrite the concerned the .aurora and trigger the `aurora update
>>    ...` command. Is this right?
>>    If yes, is there an HTTP API I can invoke remotely which triggers
>>    this update?
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
>> wrote:
>>
>>> I suspect your error from `aurora job create ...` is due to the aurora
>>> config you're using referencing `/vagrant/hello_world.py` which does not
>>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>>> config you're using?
>>>
>>> Cheers,
>>>
>>> Joshua
>>>
>>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Thanks, Zameer.
>>>>
>>>> I had to modify  /etc/aurora/clusters.json:
>>>> [
>>>>   {
>>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>>     "name": "testcluster",
>>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>>     "slave_root": "/var/lib/mesos",
>>>>     "slave_run_directory": "latest",
>>>>     "zk": "127.0.1.1"
>>>>   }
>>>> ]
>>>>
>>>> I have a hello_world.aurora in my home folder. However the following
>>>> command errors out:
>>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>>> ./hello_world.aurora
>>>> Error loading configuration: [Errno 2] No such file or directory:
>>>> '/vagrant/hello_world.py'
>>>>
>>>> A job list does work:
>>>> ~$ aurora job list testcluster
>>>>  INFO] Retrieving jobs for role None
>>>>
>>>> I am not even using the vagrant. I am using zk & mesos on the same
>>>> machine as aurora. How do I submit these job templates to aurora?
>>>>
>>>> Any pointers to documentation will be helpful.
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>>> wrote:
>>>>
>>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
>>>>> reconciliation
>>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>>> instead.
>>>>>
>>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>>> aurora. :)
>>>>>>
>>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>>> Is there a location from where I can download the binaries for *.pex
>>>>>> or build them from scratch?
>>>>>>
>>>>>> root@dev:/# find . -name "*.pex"
>>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>>> ./home/ubuntu/.pex
>>>>>> ./root/.pex
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>>> sidestepping the executor.
>>>>>>>
>>>>>>> As for working with the scheduler source code, it's a standard
>>>>>>> gradle project and we tend to use intellij.  Docs to help ramp on that:
>>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>>
>>>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>>>> any pre-built binaries.  If you're on debian, we have official debs here:
>>>>>>> https://bintray.com/apache/aurora
>>>>>>> You can see how they're built here (and can build your own)
>>>>>>> packages: https://github.com/apache/aurora-packaging
>>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Stephen,
>>>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>>>> planning to containerize/dockerize it later.
>>>>>>>>
>>>>>>>> Can you please point me to the right documentation (or a pointer to
>>>>>>>> the cli parsing source code) which can help me resolve this? Also, are
>>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>>> analyze code for this.
>>>>>>>>
>>>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>>>> the zip file nor anywhere in the built source code.
>>>>>>>>
>>>>>>>> I know I am asking too many queries on a single thread here, &
>>>>>>>> would appreciate the help.
>>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Krish,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora
>>>>>>>>> once your scheduler is up an running. It serves as an example input for the
>>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>>> on an Aurora master.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this helps a little,
>>>>>>>>>
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>>> *To:* Bill Farner
>>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>>
>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>
>>>>>>>>> Bill/Stephen,
>>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>>
>>>>>>>>> I do not know what to specify for  -framework_authentication_file
>>>>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>>>>
>>>>>>>>> I am not using any authentication on Mesos master, do I still need
>>>>>>>>> the framework_authentication_file parameter?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> rm -rf /db /backup_dir
>>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>>> -native_log_file_path=/db
>>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>>> ...
>>>>>>>>> ...
>>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>>> GuiceManagedCompon
>>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>>> deTimeZone
>>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>>> timezone Greenwich M
>>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>>>>>> doStart
>>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>>> ute: Caught unchecked exception:
>>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>>> vision errors:
>>>>>>>>>
>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>> Path cannot be null at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>
>>>>>>>>> 1 error
>>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>>
>>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>>> Path cannot be
>>>>>>>>> null
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>>> LogStreamModule.java:117)
>>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>>
>>>>>>>>> 1 error
>>>>>>>>>         at
>>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>>         at
>>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job
>>>>>>>>>> to the new config.  You'll use this same flow for updating your job's
>>>>>>>>>> software as well as resizing the job.
>>>>>>>>>>
>>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>>
>>>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <
>>>>>>>>>> krishnan.k.iyer@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>>
>>>>>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>>>>>> config file functions:
>>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>>> 2. What happens when we want to dynamically change the config,
>>>>>>>>>>> say increasing the number of instances of a service required? Does aurora
>>>>>>>>>>> require a reboot then?
>>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>>
>>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I believe you are missing the thermos_executor options that
>>>>>>>>>>>> have to be passed to the scheduler command line.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> See
>>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>>> for an example
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Stephan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>>
>>>>>>>>>>>> ...
>>>>>>>>>>>> ...
>>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>>>>>> Guice creation errors:
>>>>>>>>>>>>
>>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>>> set.
>>>>>>>>>>>>   at
>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>
>>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>>> either one (a
>>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>>   at
>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>>
>>>>>>>>>>>> 2 errors
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>>         at
>>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>>         at
>>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>>
>>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Zameer Manji
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Bill Farner <wf...@apache.org>.

(1) clusters.json is written by you, configuring the CLI client with
instructions for what clusters are available and how to discover them.

(2) That's expected - mesos only allows one active replica of a framework
at a time, this signals which one is active.

(3) The observer is essentially a web server that allows you to browse a
task's sandbox directory and other information about it.  You will need to
configure it to run on your worker/agent nodes for that functionality to
work (it's linked from the scheduler web UI).

(4) You could indeed implement that behavior externally.  There is a
reason:
https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L556-L559

(5) That is correct.  The scheduler exposes a thrift API that you would use
(a REST API is coming, but ground has not yet been broken).  If you go this
route, i suggest you skip the DSL and use the JSON task description format
that is shipped over the API.  There's not good documentation on this, but
we can help you through it and would be grateful for a writeup of your
approach!


On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com> wrote:

> Hi Folks,
> Firstly, thanks for all the help. Am happy to report that I have set up
> zk, mesos & aurora, & can work further towards my idea of having an
> auto-scaling cluster.
> I have some further questions about the work done so far & things I plan
> to do:
>
>    1. Is the /etc/aurora/clusters.json file created by the scheduled or
>    does it need to be handcrafted? I had to manually edit the file to get my
>    `aurora job ...` cli to work.
>
>    2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos &
>    aurora in a docker container. Only 1 of them outputs '1' when I look at the
>    framework_registered' field. Is this expected? How do I verify that they
>    are working as a cluster?
>
>    3. From the documentation, I see that there is an observer that needs
>    to be listening on port 1338. What is the observer socket & its purpose? I
>    have aurora listening only on ports 8081 (http port) & 8083 (libprocess).
>
>    4. I read about the 'PENDING' field in aurora documentation, as Bill
>    suggested, & realize that it just shows that a task is waiting for some
>    reasons (for want of resources, in my case, as 0 slaves have registered). I
>    was thinking of adding a hook to the pending state; say if a task is
>    PENDING for 5 minutes for lack of resources in the cluster, then spin up a
>    new machine. Is this the right approach to take? Does aurora provide
>    reasons for why is a task in PENDING state?
>
>    => aurora job status testcluster/$USER/test/hello_world
>     INFO] Checking status of testcluster/ubuntu/test/hello_world
>    Active tasks (1):
>           Task role: ubuntu, env: test, name: hello_world, instance: 0,
>    status:
>    PENDING on None
>              cpus: 0.1, ram: 16 MB, disk: 16 MB
>              events:
>               2015-10-23 04:55:33 PENDING: None
>    Inactive tasks (0):
>
>    5. Aurora defines job/s is a .aurora config file & if I decide to
>    increase/decrease the number of instances in my cluster, then I need to
>    create/overwrite the concerned the .aurora and trigger the `aurora update
>    ...` command. Is this right?
>    If yes, is there an HTTP API I can invoke remotely which triggers this
>    update?
>
>
>
>
> --
> κρισhναν
>
> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
> wrote:
>
>> I suspect your error from `aurora job create ...` is due to the aurora
>> config you're using referencing `/vagrant/hello_world.py` which does not
>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>> config you're using?
>>
>> Cheers,
>>
>> Joshua
>>
>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com> wrote:
>>
>>> Thanks, Zameer.
>>>
>>> I had to modify  /etc/aurora/clusters.json:
>>> [
>>>   {
>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>     "name": "testcluster",
>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>     "slave_root": "/var/lib/mesos",
>>>     "slave_run_directory": "latest",
>>>     "zk": "127.0.1.1"
>>>   }
>>> ]
>>>
>>> I have a hello_world.aurora in my home folder. However the following
>>> command errors out:
>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>> ./hello_world.aurora
>>> Error loading configuration: [Errno 2] No such file or directory:
>>> '/vagrant/hello_world.py'
>>>
>>> A job list does work:
>>> ~$ aurora job list testcluster
>>>  INFO] Retrieving jobs for role None
>>>
>>> I am not even using the vagrant. I am using zk & mesos on the same
>>> machine as aurora. How do I submit these job templates to aurora?
>>>
>>> Any pointers to documentation will be helpful.
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org>
>>> wrote:
>>>
>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
>>>> reconciliation
>>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>>> instead.
>>>>
>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>> aurora. :)
>>>>>
>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>> Is there a location from where I can download the binaries for *.pex
>>>>> or build them from scratch?
>>>>>
>>>>> root@dev:/# find . -name "*.pex"
>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>> ./home/ubuntu/.pex
>>>>> ./root/.pex
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Aurora currently requires an executor, so setting it to /dev/null
>>>>>> will not work.  Happy to talk further about your thoughts around
>>>>>> sidestepping the executor.
>>>>>>
>>>>>> As for working with the scheduler source code, it's a standard gradle
>>>>>> project and we tend to use intellij.  Docs to help ramp on that:
>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>
>>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>>> any pre-built binaries.  If you're on debian, we have official debs here:
>>>>>> https://bintray.com/apache/aurora
>>>>>> You can see how they're built here (and can build your own) packages:
>>>>>> https://github.com/apache/aurora-packaging
>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Stephen,
>>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>>> planning to containerize/dockerize it later.
>>>>>>>
>>>>>>> Can you please point me to the right documentation (or a pointer to
>>>>>>> the cli parsing source code) which can help me resolve this? Also, are
>>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>>> analyze code for this.
>>>>>>>
>>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>>> the zip file nor anywhere in the built source code.
>>>>>>>
>>>>>>> I know I am asking too many queries on a single thread here, & would
>>>>>>> appreciate the help.
>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>
>>>>>>>> Hi Krish,
>>>>>>>>
>>>>>>>>
>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>
>>>>>>>>
>>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora once
>>>>>>>> your scheduler is up an running. It serves as an example input for the
>>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>>> on an Aurora master.
>>>>>>>>
>>>>>>>>
>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>>
>>>>>>>>
>>>>>>>> Hope this helps a little,
>>>>>>>>
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>>> *To:* Bill Farner
>>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>>
>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>
>>>>>>>> Bill/Stephen,
>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>
>>>>>>>> I do not know what to specify for  -framework_authentication_file
>>>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>>>
>>>>>>>> I am not using any authentication on Mesos master, do I still need
>>>>>>>> the framework_authentication_file parameter?
>>>>>>>>
>>>>>>>>
>>>>>>>> rm -rf /db /backup_dir
>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>> -native_log_file_path=/db
>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>> ...
>>>>>>>> ...
>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>> GuiceManagedCompon
>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>> deTimeZone
>>>>>>>> WARNING: Cron schedules are configured to fire according to
>>>>>>>> timezone Greenwich M
>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>>>>> doStart
>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>> ute: Caught unchecked exception:
>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>> vision errors:
>>>>>>>>
>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>> Path cannot be null at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>
>>>>>>>> 1 error
>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>
>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>> Path cannot be
>>>>>>>> null
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>
>>>>>>>> 1 error
>>>>>>>>         at
>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>         at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job to
>>>>>>>>> the new config.  You'll use this same flow for updating your job's software
>>>>>>>>> as well as resizing the job.
>>>>>>>>>
>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>
>>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <krishnan.k.iyer@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>
>>>>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>>>>> config file functions:
>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>> 2. What happens when we want to dynamically change the config,
>>>>>>>>>> say increasing the number of instances of a service required? Does aurora
>>>>>>>>>> require a reboot then?
>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>
>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I believe you are missing the thermos_executor options that have
>>>>>>>>>>> to be passed to the scheduler command line.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> See
>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>> for an example
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>>
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------
>>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some
>>>>>>>>>>> things on my local machine with zookeeper and mesos-master running locally.
>>>>>>>>>>> They have initialized properly. When I try to run aurora with the required
>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>> ...
>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>>>>> Guice creation errors:
>>>>>>>>>>>
>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>> set.
>>>>>>>>>>>   at
>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>
>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>>> either one (a
>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>
>>>>>>>>>>> 2 errors
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>
>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Zameer Manji
>>>>
>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Maxim Khutornenko <ma...@apache.org>.

1. Scheduler does not generate or consume this file. This is a config
file your aurora client needs to talk to the scheduler.

2. Yes, it's expected. Only one scheduler from an ensemble is a leader
at any given moment. All others are "hot" standby redirecting all
requests to the leader.

3. Observer is a component we run on every host to monitor thermos
tasks and expose their process graph and sandbox files in the UI. If
you click the hostname (ip) of a particular task on a job page you are
redirected to the host observer UI.

4. I don't think you want to react to every PENDING status by adding
more machines to the cluster. There are plenty of stats on the
scheduler that you can monitor and draw conclusions about the status
available capacity. Look at "empty_slots_*" stats or
"sla_cluster_mtta_ms" for example. You can find more details in the
docs. As for PENDING reason, the UI should show the PENDING status
reason when your task fails to schedule. This may not work as expected
though if there are no offers received like in your case. However,
once you have offers any failed scheduling round will generate a veto
reason exposed in the UI.

5. Correct. We have plans to make it easier eventually (AURORA-1258)
but for now you have to bump up the instance count in the .aurora
config and run "aurora update start" to scale up. There is no pure
HTTP (REST) API yet. This is our another point of interest
(AURORA-987). For now, you can call startJobUpdate thrift API directly
in case you can't use cli commands.

On Mon, Oct 26, 2015 at 11:44 PM, Krish <kr...@gmail.com> wrote:
> Hi Folks,
> Firstly, thanks for all the help. Am happy to report that I have set up zk,
> mesos & aurora, & can work further towards my idea of having an auto-scaling
> cluster.
> I have some further questions about the work done so far & things I plan to
> do:
>
> Is the /etc/aurora/clusters.json file created by the scheduled or does it
> need to be handcrafted? I had to manually edit the file to get my `aurora
> job ...` cli to work.
>
> I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos & aurora in
> a docker container. Only 1 of them outputs '1' when I look at the
> framework_registered' field. Is this expected? How do I verify that they are
> working as a cluster?
>
> From the documentation, I see that there is an observer that needs to be
> listening on port 1338. What is the observer socket & its purpose? I have
> aurora listening only on ports 8081 (http port) & 8083 (libprocess).
>
> I read about the 'PENDING' field in aurora documentation, as Bill suggested,
> & realize that it just shows that a task is waiting for some reasons (for
> want of resources, in my case, as 0 slaves have registered). I was thinking
> of adding a hook to the pending state; say if a task is PENDING for 5
> minutes for lack of resources in the cluster, then spin up a new machine. Is
> this the right approach to take? Does aurora provide reasons for why is a
> task in PENDING state?
>
> => aurora job status testcluster/$USER/test/hello_world
>  INFO] Checking status of testcluster/ubuntu/test/hello_world
> Active tasks (1):
>        Task role: ubuntu, env: test, name: hello_world, instance: 0, status:
> PENDING on None
>           cpus: 0.1, ram: 16 MB, disk: 16 MB
>           events:
>            2015-10-23 04:55:33 PENDING: None
> Inactive tasks (0):
>
> Aurora defines job/s is a .aurora config file & if I decide to
> increase/decrease the number of instances in my cluster, then I need to
> create/overwrite the concerned the .aurora and trigger the `aurora update
> ...` command. Is this right?
> If yes, is there an HTTP API I can invoke remotely which triggers this
> update?
>
>
>
>
> --
> κρισhναν
>
> On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
> wrote:
>>
>> I suspect your error from `aurora job create ...` is due to the aurora
>> config you're using referencing `/vagrant/hello_world.py` which does not
>> exist (as you say: you're not even using Vagrant). Can you link the .aurora
>> config you're using?
>>
>> Cheers,
>>
>> Joshua
>>
>> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com> wrote:
>>>
>>> Thanks, Zameer.
>>>
>>> I had to modify  /etc/aurora/clusters.json:
>>> [
>>>   {
>>>     "auth_mechanism": "UNAUTHENTICATED",
>>>     "name": "testcluster",
>>>     "scheduler_zk_path": "/scheduler/aurora",
>>>     "slave_root": "/var/lib/mesos",
>>>     "slave_run_directory": "latest",
>>>     "zk": "127.0.1.1"
>>>   }
>>> ]
>>>
>>> I have a hello_world.aurora in my home folder. However the following
>>> command errors out:
>>> ~$ aurora job create testcluster/testrole/test/hellojob
>>> ./hello_world.aurora
>>> Error loading configuration: [Errno 2] No such file or directory:
>>> '/vagrant/hello_world.py'
>>>
>>> A job list does work:
>>> ~$ aurora job list testcluster
>>>  INFO] Retrieving jobs for role None
>>>
>>> I am not even using the vagrant. I am using zk & mesos on the same
>>> machine as aurora. How do I submit these job templates to aurora?
>>>
>>> Any pointers to documentation will be helpful.
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org> wrote:
>>>>
>>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos'
>>>> task reconciliation API instead.
>>>>
>>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>>> aurora. :)
>>>>>
>>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>>> Is there a location from where I can download the binaries for *.pex or
>>>>> build them from scratch?
>>>>>
>>>>> root@dev:/# find . -name "*.pex"
>>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>>> ./usr/share/aurora/bin/kaurora.pex
>>>>> ./usr/share/aurora/bin/thermos.pex
>>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>>> ./home/ubuntu/.pex
>>>>> ./root/.pex
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>>> wrote:
>>>>>>
>>>>>> Aurora currently requires an executor, so setting it to /dev/null will
>>>>>> not work.  Happy to talk further about your thoughts around sidestepping the
>>>>>> executor.
>>>>>>
>>>>>> As for working with the scheduler source code, it's a standard gradle
>>>>>> project and we tend to use intellij.  Docs to help ramp on that:
>>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>>
>>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>>> any pre-built binaries.  If you're on debian, we have official debs here:
>>>>>> https://bintray.com/apache/aurora
>>>>>> You can see how they're built here (and can build your own) packages:
>>>>>> https://github.com/apache/aurora-packaging
>>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Stephen,
>>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>>> planning to containerize/dockerize it later.
>>>>>>>
>>>>>>> Can you please point me to the right documentation (or a pointer to
>>>>>>> the cli parsing source code) which can help me resolve this? Also, are there
>>>>>>> any steps steps to import source code into eclipse to browse & analyze code
>>>>>>> for this.
>>>>>>>
>>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>>> the zip file nor anywhere in the built source code.
>>>>>>>
>>>>>>> I know I am asking too many queries on a single thread here, & would
>>>>>>> appreciate the help.
>>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan
>>>>>>> <St...@blue-yonder.com> wrote:
>>>>>>>>
>>>>>>>> Hi Krish,
>>>>>>>>
>>>>>>>>
>>>>>>>> you don't have to set framework_authentication_file and
>>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>>> everything will work fine if you leave those empty.
>>>>>>>>
>>>>>>>>
>>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>>> thermos_executor_path command line flag of the scheduler. It is supposed to
>>>>>>>> point to the binary containing the generic Aurora executor
>>>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora once your
>>>>>>>> scheduler is up an running. It serves as an example input for the aurora
>>>>>>>> command line client which can be used to scheduler jobs and services on an
>>>>>>>> Aurora master.
>>>>>>>>
>>>>>>>>
>>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to play
>>>>>>>> with. Once you have understood how it works, you can start trying to install
>>>>>>>> it on your own (by reverse-engineering the vagrant box).
>>>>>>>>
>>>>>>>>
>>>>>>>> Hope this helps a little,
>>>>>>>>
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ________________________________
>>>>>>>> From: Krish <kr...@gmail.com>
>>>>>>>> Sent: Tuesday, October 20, 2015 11:39 AM
>>>>>>>> To: Bill Farner
>>>>>>>> Cc: user@aurora.apache.org; Erb, Stephan
>>>>>>>>
>>>>>>>> Subject: Re: Stacktrace when running Apache Aurora
>>>>>>>>
>>>>>>>> Bill/Stephen,
>>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>>
>>>>>>>> I do not know what to specify for  -framework_authentication_file &
>>>>>>>> -zk_digest_credentials, and they are required arguments.
>>>>>>>>
>>>>>>>> I am not using any authentication on Mesos master, do I still need
>>>>>>>> the framework_authentication_file parameter?
>>>>>>>>
>>>>>>>>
>>>>>>>> rm -rf /db /backup_dir
>>>>>>>> mesos-log initialize --path="/db"
>>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>>> -native_log_file_path=/db
>>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>>> ...
>>>>>>>> ...
>>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>>> GuiceManagedCompon
>>>>>>>> entProvider with the scope "PerRequest"
>>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>>> deTimeZone
>>>>>>>> WARNING: Cron schedules are configured to fire according to timezone
>>>>>>>> Greenwich M
>>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>>>>> doStart
>>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>>> ute: Caught unchecked exception:
>>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>>> vision errors:
>>>>>>>>
>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>> Path cannot be null at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>
>>>>>>>> 1 error
>>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>>
>>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>>> Path cannot be
>>>>>>>> null
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>>> LogStreamModule.java:117)
>>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>>
>>>>>>>> 1 error
>>>>>>>>         at
>>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>>         at
>>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job to the new
>>>>>>>>> config.  You'll use this same flow for updating your job's software as well
>>>>>>>>> as resizing the job.
>>>>>>>>>
>>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>>
>>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>>
>>>>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>>>>> config file functions:
>>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>>> 2. What happens when we want to dynamically change the config, say
>>>>>>>>>> increasing the number of instances of a service required? Does aurora
>>>>>>>>>> require a reboot then?
>>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>>
>>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan
>>>>>>>>>> <St...@blue-yonder.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> I believe you are missing the thermos_executor options that have
>>>>>>>>>>> to be passed to the scheduler command line.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> See
>>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>>> for an example
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>>
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ________________________________
>>>>>>>>>>> From: Krish <kr...@gmail.com>
>>>>>>>>>>> Sent: Monday, October 19, 2015 8:45 AM
>>>>>>>>>>> To: user@aurora.apache.org
>>>>>>>>>>> Subject: Re: Stacktrace when running Apache Aurora
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some things
>>>>>>>>>>> on my local machine with zookeeper and mesos-master running locally. They
>>>>>>>>>>> have initialized properly. When I try to run aurora with the required
>>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>> ...
>>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could indicate a
>>>>>>>>>>> bug.  The method
>>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>>>>> Guice creation errors:
>>>>>>>>>>>
>>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>>> set.
>>>>>>>>>>>   at
>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>
>>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have either
>>>>>>>>>>> one (a
>>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>>
>>>>>>>>>>> 2 errors
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>>         at
>>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>>         at
>>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>>         at
>>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>>         ... 7 more
>>>>>>>>>>>
>>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Zameer Manji
>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Hi Folks,
Firstly, thanks for all the help. Am happy to report that I have set up zk,
mesos & aurora, & can work further towards my idea of having an
auto-scaling cluster.
I have some further questions about the work done so far & things I plan to
do:

   1. Is the /etc/aurora/clusters.json file created by the scheduled or
   does it need to be handcrafted? I had to manually edit the file to get my
   `aurora job ...` cli to work.

   2. I am running a cluster of 3 coreOS VMs on vagrant with zk, mesos &
   aurora in a docker container. Only 1 of them outputs '1' when I look at the
   framework_registered' field. Is this expected? How do I verify that they
   are working as a cluster?

   3. From the documentation, I see that there is an observer that needs to
   be listening on port 1338. What is the observer socket & its purpose? I
   have aurora listening only on ports 8081 (http port) & 8083 (libprocess).

   4. I read about the 'PENDING' field in aurora documentation, as Bill
   suggested, & realize that it just shows that a task is waiting for some
   reasons (for want of resources, in my case, as 0 slaves have registered). I
   was thinking of adding a hook to the pending state; say if a task is
   PENDING for 5 minutes for lack of resources in the cluster, then spin up a
   new machine. Is this the right approach to take? Does aurora provide
   reasons for why is a task in PENDING state?

   => aurora job status testcluster/$USER/test/hello_world
    INFO] Checking status of testcluster/ubuntu/test/hello_world
   Active tasks (1):
          Task role: ubuntu, env: test, name: hello_world, instance: 0,
   status:
   PENDING on None
             cpus: 0.1, ram: 16 MB, disk: 16 MB
             events:
              2015-10-23 04:55:33 PENDING: None
   Inactive tasks (0):

   5. Aurora defines job/s is a .aurora config file & if I decide to
   increase/decrease the number of instances in my cluster, then I need to
   create/overwrite the concerned the .aurora and trigger the `aurora update
   ...` command. Is this right?
   If yes, is there an HTTP API I can invoke remotely which triggers this
   update?




--
κρισhναν

On Fri, Oct 23, 2015 at 8:09 AM, Joshua Cohen <jc...@twopensource.com>
wrote:

> I suspect your error from `aurora job create ...` is due to the aurora
> config you're using referencing `/vagrant/hello_world.py` which does not
> exist (as you say: you're not even using Vagrant). Can you link the .aurora
> config you're using?
>
> Cheers,
>
> Joshua
>
> On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com> wrote:
>
>> Thanks, Zameer.
>>
>> I had to modify  /etc/aurora/clusters.json:
>> [
>>   {
>>     "auth_mechanism": "UNAUTHENTICATED",
>>     "name": "testcluster",
>>     "scheduler_zk_path": "/scheduler/aurora",
>>     "slave_root": "/var/lib/mesos",
>>     "slave_run_directory": "latest",
>>     "zk": "127.0.1.1"
>>   }
>> ]
>>
>> I have a hello_world.aurora in my home folder. However the following
>> command errors out:
>> ~$ aurora job create testcluster/testrole/test/hellojob
>> ./hello_world.aurora
>> Error loading configuration: [Errno 2] No such file or directory:
>> '/vagrant/hello_world.py'
>>
>> A job list does work:
>> ~$ aurora job list testcluster
>>  INFO] Retrieving jobs for role None
>>
>> I am not even using the vagrant. I am using zk & mesos on the same
>> machine as aurora. How do I submit these job templates to aurora?
>>
>> Any pointers to documentation will be helpful.
>>
>>
>> --
>> κρισhναν
>>
>> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org> wrote:
>>
>>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
>>> reconciliation
>>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>>> instead.
>>>
>>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Bill for the location to the debs. I was finally able to run
>>>> aurora. :)
>>>>
>>>> I did find thermos_executor.pex & thermos_observer after installing
>>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>>> Is there a location from where I can download the binaries for *.pex or
>>>> build them from scratch?
>>>>
>>>> root@dev:/# find . -name "*.pex"
>>>> ./usr/share/aurora/bin/thermos_executor.pex
>>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>>> ./usr/share/aurora/bin/kaurora.pex
>>>> ./usr/share/aurora/bin/thermos.pex
>>>> ./usr/share/aurora/bin/thermos_observer.pex
>>>> ./home/ubuntu/.pex
>>>> ./root/.pex
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>>> wrote:
>>>>
>>>>> Aurora currently requires an executor, so setting it to /dev/null will
>>>>> not work.  Happy to talk further about your thoughts around sidestepping
>>>>> the executor.
>>>>>
>>>>> As for working with the scheduler source code, it's a standard gradle
>>>>> project and we tend to use intellij.  Docs to help ramp on that:
>>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>>
>>>>> As for builds - the .zip is a source distribution, so it won't have
>>>>> any pre-built binaries.  If you're on debian, we have official debs here:
>>>>> https://bintray.com/apache/aurora
>>>>> You can see how they're built here (and can build your own) packages:
>>>>> https://github.com/apache/aurora-packaging
>>>>> We're close to having official RPMs, but none to speak of yet.
>>>>>
>>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Stephen,
>>>>>> I am trying to get started and run aurora without thermos executor
>>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>>> planning to containerize/dockerize it later.
>>>>>>
>>>>>> Can you please point me to the right documentation (or a pointer to
>>>>>> the cli parsing source code) which can help me resolve this? Also, are
>>>>>> there any steps steps to import source code into eclipse to browse &
>>>>>> analyze code for this.
>>>>>>
>>>>>> Also, where do i find all the *.pex files? They are not present in
>>>>>> the zip file nor anywhere in the built source code.
>>>>>>
>>>>>> I know I am asking too many queries on a single thread here, & would
>>>>>> appreciate the help.
>>>>>> I think at the end of this, I will put the steps I followed in a
>>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>
>>>>>>> Hi Krish,
>>>>>>>
>>>>>>>
>>>>>>> you don't have to set framework_authentication_file and
>>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>>> everything will work fine if you leave those empty.
>>>>>>>
>>>>>>>
>>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora once
>>>>>>> your scheduler is up an running. It serves as an example input for the
>>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>>> on an Aurora master.
>>>>>>>
>>>>>>>
>>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>>
>>>>>>>
>>>>>>> Hope this helps a little,
>>>>>>>
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>>> *To:* Bill Farner
>>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>>
>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>
>>>>>>> Bill/Stephen,
>>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>>
>>>>>>> I do not know what to specify for  -framework_authentication_file
>>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>>
>>>>>>> I am not using any authentication on Mesos master, do I still need
>>>>>>> the framework_authentication_file parameter?
>>>>>>>
>>>>>>>
>>>>>>> rm -rf /db /backup_dir
>>>>>>> mesos-log initialize --path="/db"
>>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>>> -native_log_file_path=/db
>>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>>> ...
>>>>>>> ...
>>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>>> GuiceManagedCompon
>>>>>>> entProvider with the scope "PerRequest"
>>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>>> deTimeZone
>>>>>>> WARNING: Cron schedules are configured to fire according to timezone
>>>>>>> Greenwich M
>>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>>>> doStart
>>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>>> E1020 09:27:41.290 THREAD1
>>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>>> ute: Caught unchecked exception:
>>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>>> vision errors:
>>>>>>>
>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>> Path cannot be null at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>>   at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>   at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>
>>>>>>> 1 error
>>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>>
>>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException:
>>>>>>> Path cannot be
>>>>>>> null
>>>>>>>   at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>>   at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>>> LogStreamModule.java:117)
>>>>>>>   while locating org.apache.mesos.Log
>>>>>>>   at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>>
>>>>>>> 1 error
>>>>>>>         at
>>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>>         at
>>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job to
>>>>>>>> the new config.  You'll use this same flow for updating your job's software
>>>>>>>> as well as resizing the job.
>>>>>>>>
>>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>>
>>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>>
>>>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>>>> config file functions:
>>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>>> 2. What happens when we want to dynamically change the config, say
>>>>>>>>> increasing the number of instances of a service required? Does aurora
>>>>>>>>> require a reboot then?
>>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>>
>>>>>>>>> I think a little bit of context would help here.
>>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>>
>>>>>>>>>> I believe you are missing the thermos_executor options that have
>>>>>>>>>> to be passed to the scheduler command line.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> See
>>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>>> for an example
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best Regards,
>>>>>>>>>>
>>>>>>>>>> Stephan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------
>>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> I am a n00b with apache aurora & trying to experiment some things
>>>>>>>>>> on my local machine with zookeeper and mesos-master running locally. They
>>>>>>>>>> have initialized properly. When I try to run aurora with the required
>>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>>
>>>>>>>>>> ...
>>>>>>>>>> ...
>>>>>>>>>> WARNING: Method [public void
>>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This
>>>>>>>>>> could indicate a bug.  The method
>>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>>>> Guice creation errors:
>>>>>>>>>>
>>>>>>>>>> 1) An exception was caught and reported. Message: A value may
>>>>>>>>>> only be retrieved from a variable that has a default or has been
>>>>>>>>>> set.
>>>>>>>>>>   at
>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>
>>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>>> either one (a
>>>>>>>>>> nd only one) constructor annotated with @Inject or a
>>>>>>>>>> zero-argument constructor that is not private.
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>>   at
>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>>
>>>>>>>>>> 2 errors
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>>         at
>>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>>         at
>>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>>         at
>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>>         at
>>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>>>         at
>>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>>         at
>>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>>         at
>>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>>         ... 7 more
>>>>>>>>>>
>>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> κρισhναν
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Zameer Manji
>>>
>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Joshua Cohen <jc...@twopensource.com>.

I suspect your error from `aurora job create ...` is due to the aurora
config you're using referencing `/vagrant/hello_world.py` which does not
exist (as you say: you're not even using Vagrant). Can you link the .aurora
config you're using?

Cheers,

Joshua

On Thu, Oct 22, 2015 at 3:22 PM, Krish <kr...@gmail.com> wrote:

> Thanks, Zameer.
>
> I had to modify  /etc/aurora/clusters.json:
> [
>   {
>     "auth_mechanism": "UNAUTHENTICATED",
>     "name": "testcluster",
>     "scheduler_zk_path": "/scheduler/aurora",
>     "slave_root": "/var/lib/mesos",
>     "slave_run_directory": "latest",
>     "zk": "127.0.1.1"
>   }
> ]
>
> I have a hello_world.aurora in my home folder. However the following
> command errors out:
> ~$ aurora job create testcluster/testrole/test/hellojob
> ./hello_world.aurora
> Error loading configuration: [Errno 2] No such file or directory:
> '/vagrant/hello_world.py'
>
> A job list does work:
> ~$ aurora job list testcluster
>  INFO] Retrieving jobs for role None
>
> I am not even using the vagrant. I am using zk & mesos on the same machine
> as aurora. How do I submit these job templates to aurora?
>
> Any pointers to documentation will be helpful.
>
>
> --
> κρισhναν
>
> On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org> wrote:
>
>> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
>> reconciliation
>> <http://mesos.apache.org/documentation/latest/reconciliation/> API
>> instead.
>>
>> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com> wrote:
>>
>>> Thanks Bill for the location to the debs. I was finally able to run
>>> aurora. :)
>>>
>>> I did find thermos_executor.pex & thermos_observer after installing
>>> aurora-executor. I still could not find gc_executor.pex on my system.
>>> Is there a location from where I can download the binaries for *.pex or
>>> build them from scratch?
>>>
>>> root@dev:/# find . -name "*.pex"
>>> ./usr/share/aurora/bin/thermos_executor.pex
>>> ./usr/share/aurora/bin/kaurora_admin.pex
>>> ./usr/share/aurora/bin/kaurora.pex
>>> ./usr/share/aurora/bin/thermos.pex
>>> ./usr/share/aurora/bin/thermos_observer.pex
>>> ./home/ubuntu/.pex
>>> ./root/.pex
>>>
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org>
>>> wrote:
>>>
>>>> Aurora currently requires an executor, so setting it to /dev/null will
>>>> not work.  Happy to talk further about your thoughts around sidestepping
>>>> the executor.
>>>>
>>>> As for working with the scheduler source code, it's a standard gradle
>>>> project and we tend to use intellij.  Docs to help ramp on that:
>>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>>
>>>> As for builds - the .zip is a source distribution, so it won't have any
>>>> pre-built binaries.  If you're on debian, we have official debs here:
>>>> https://bintray.com/apache/aurora
>>>> You can see how they're built here (and can build your own) packages:
>>>> https://github.com/apache/aurora-packaging
>>>> We're close to having official RPMs, but none to speak of yet.
>>>>
>>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Stephen,
>>>>> I am trying to get started and run aurora without thermos executor
>>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>>> planning to containerize/dockerize it later.
>>>>>
>>>>> Can you please point me to the right documentation (or a pointer to
>>>>> the cli parsing source code) which can help me resolve this? Also, are
>>>>> there any steps steps to import source code into eclipse to browse &
>>>>> analyze code for this.
>>>>>
>>>>> Also, where do i find all the *.pex files? They are not present in the
>>>>> zip file nor anywhere in the built source code.
>>>>>
>>>>> I know I am asking too many queries on a single thread here, & would
>>>>> appreciate the help.
>>>>> I think at the end of this, I will put the steps I followed in a
>>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>
>>>>>> Hi Krish,
>>>>>>
>>>>>>
>>>>>> you don't have to set framework_authentication_file and
>>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>>> everything will work fine if you leave those empty.
>>>>>>
>>>>>>
>>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>>> (thermos_executor.pex).  You only need the hello_world.aurora once
>>>>>> your scheduler is up an running. It serves as an example input for the
>>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>>> on an Aurora master.
>>>>>>
>>>>>>
>>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>>> play with. Once you have understood how it works, you can start trying to
>>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>>
>>>>>>
>>>>>> Hope this helps a little,
>>>>>>
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>>> *To:* Bill Farner
>>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>>
>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>
>>>>>> Bill/Stephen,
>>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>>
>>>>>> I do not know what to specify for  -framework_authentication_file
>>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>>
>>>>>> I am not using any authentication on Mesos master, do I still need
>>>>>> the framework_authentication_file parameter?
>>>>>>
>>>>>>
>>>>>> rm -rf /db /backup_dir
>>>>>> mesos-log initialize --path="/db"
>>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>>> -native_log_file_path=/db
>>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>>> ...
>>>>>> ...
>>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>>> GuiceManagedCompon
>>>>>> entProvider with the scope "PerRequest"
>>>>>> Oct 20, 2015 9:27:40 AM
>>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>>> deTimeZone
>>>>>> WARNING: Cron schedules are configured to fire according to timezone
>>>>>> Greenwich M
>>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>>> doStart
>>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>>> E1020 09:27:41.290 THREAD1
>>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>>> ute: Caught unchecked exception:
>>>>>> com.google.inject.ProvisionException: Guice pro
>>>>>> vision errors:
>>>>>>
>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>>>>> cannot be null at
>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>> LogStreamModule.java:117)
>>>>>>   at
>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>> LogStreamModule.java:117)
>>>>>>   while locating org.apache.mesos.Log
>>>>>>   at
>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>
>>>>>> 1 error
>>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>>
>>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>>>>> cannot be
>>>>>> null
>>>>>>   at
>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>> LogStreamModule.java:117)
>>>>>>   at
>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>>> LogStreamModule.java:117)
>>>>>>   while locating org.apache.mesos.Log
>>>>>>   at
>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>>> ace(MesosLogStreamModule.java:152)
>>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>>
>>>>>> 1 error
>>>>>>         at
>>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>>         at
>>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> The typical flow is that you keep your .aurora file checked into
>>>>>>> git, and commit every time you deploy/update.  When you change your file,
>>>>>>> you will instruct Aurora to update the live job (have a look at aurora
>>>>>>> update -h).  Aurora will perform a rolling upgrade of your job to
>>>>>>> the new config.  You'll use this same flow for updating your job's software
>>>>>>> as well as resizing the job.
>>>>>>>
>>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>>> exports.  Have a look here for monitoring background:
>>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>>
>>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>>
>>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>>
>>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>>> config file functions:
>>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>>> 2. What happens when we want to dynamically change the config, say
>>>>>>>> increasing the number of instances of a service required? Does aurora
>>>>>>>> require a reboot then?
>>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>>> this or try to look for a hook into mesos?
>>>>>>>>
>>>>>>>> I think a little bit of context would help here.
>>>>>>>> What I plan to check is to run a very basic job/task inside a
>>>>>>>> docker container with aurora & wait for a 'resource not available' message
>>>>>>>> from mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>>
>>>>>>>>> I believe you are missing the thermos_executor options that have
>>>>>>>>> to be passed to the scheduler command line.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> See
>>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>>> for an example
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>>
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ------------------------------
>>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> I am a n00b with apache aurora & trying to experiment some things
>>>>>>>>> on my local machine with zookeeper and mesos-master running locally. They
>>>>>>>>> have initialized properly. When I try to run aurora with the required
>>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> ...
>>>>>>>>> WARNING: Method [public void
>>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>>> is synthetic and is being intercepted by
>>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>>>>>>>>> indicate a bug.  The method
>>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>>> Guice creation errors:
>>>>>>>>>
>>>>>>>>> 1) An exception was caught and reported. Message: A value may only
>>>>>>>>> be retrieved from a variable that has a default or has been
>>>>>>>>> set.
>>>>>>>>>   at
>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>
>>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>>> either one (a
>>>>>>>>> nd only one) constructor annotated with @Inject or a zero-argument
>>>>>>>>> constructor that is not private.
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>>   at
>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>>
>>>>>>>>> 2 errors
>>>>>>>>>         at
>>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>>         at
>>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>>         at
>>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>>         at
>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>>         at
>>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>>         at
>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>>         at
>>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>>         at
>>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>>         at
>>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>>         ... 7 more
>>>>>>>>>
>>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> κρισhναν
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Zameer Manji
>>
>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Thanks, Zameer.

I had to modify  /etc/aurora/clusters.json:
[
  {
    "auth_mechanism": "UNAUTHENTICATED",
    "name": "testcluster",
    "scheduler_zk_path": "/scheduler/aurora",
    "slave_root": "/var/lib/mesos",
    "slave_run_directory": "latest",
    "zk": "127.0.1.1"
  }
]

I have a hello_world.aurora in my home folder. However the following
command errors out:
~$ aurora job create testcluster/testrole/test/hellojob
./hello_world.aurora
Error loading configuration: [Errno 2] No such file or directory:
'/vagrant/hello_world.py'

A job list does work:
~$ aurora job list testcluster
 INFO] Retrieving jobs for role None

I am not even using the vagrant. I am using zk & mesos on the same machine
as aurora. How do I submit these job templates to aurora?

Any pointers to documentation will be helpful.


--
κρισhναν

On Wed, Oct 21, 2015 at 11:09 PM, Zameer Manji <zm...@apache.org> wrote:

> Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
> reconciliation
> <http://mesos.apache.org/documentation/latest/reconciliation/> API
> instead.
>
> On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com> wrote:
>
>> Thanks Bill for the location to the debs. I was finally able to run
>> aurora. :)
>>
>> I did find thermos_executor.pex & thermos_observer after installing
>> aurora-executor. I still could not find gc_executor.pex on my system.
>> Is there a location from where I can download the binaries for *.pex or
>> build them from scratch?
>>
>> root@dev:/# find . -name "*.pex"
>> ./usr/share/aurora/bin/thermos_executor.pex
>> ./usr/share/aurora/bin/kaurora_admin.pex
>> ./usr/share/aurora/bin/kaurora.pex
>> ./usr/share/aurora/bin/thermos.pex
>> ./usr/share/aurora/bin/thermos_observer.pex
>> ./home/ubuntu/.pex
>> ./root/.pex
>>
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org> wrote:
>>
>>> Aurora currently requires an executor, so setting it to /dev/null will
>>> not work.  Happy to talk further about your thoughts around sidestepping
>>> the executor.
>>>
>>> As for working with the scheduler source code, it's a standard gradle
>>> project and we tend to use intellij.  Docs to help ramp on that:
>>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>>
>>> As for builds - the .zip is a source distribution, so it won't have any
>>> pre-built binaries.  If you're on debian, we have official debs here:
>>> https://bintray.com/apache/aurora
>>> You can see how they're built here (and can build your own) packages:
>>> https://github.com/apache/aurora-packaging
>>> We're close to having official RPMs, but none to speak of yet.
>>>
>>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Stephen,
>>>> I am trying to get started and run aurora without thermos executor
>>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>>> planning to containerize/dockerize it later.
>>>>
>>>> Can you please point me to the right documentation (or a pointer to the
>>>> cli parsing source code) which can help me resolve this? Also, are there
>>>> any steps steps to import source code into eclipse to browse & analyze code
>>>> for this.
>>>>
>>>> Also, where do i find all the *.pex files? They are not present in the
>>>> zip file nor anywhere in the built source code.
>>>>
>>>> I know I am asking too many queries on a single thread here, & would
>>>> appreciate the help.
>>>> I think at the end of this, I will put the steps I followed in a
>>>> gist/blog so others might find their way around, & not struggle as much.
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>
>>>>> Hi Krish,
>>>>>
>>>>>
>>>>> you don't have to set framework_authentication_file and
>>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>>> everything will work fine if you leave those empty.
>>>>>
>>>>>
>>>>> In addition, looks like you are misunderstanding the usage of the
>>>>> thermos_executor_path command line flag of the scheduler. It is
>>>>> supposed to point to the binary containing the generic Aurora executor
>>>>> (thermos_executor.pex).  You only need the hello_world.aurora once
>>>>> your scheduler is up an running. It serves as an example input for the
>>>>> aurora command line client which can be used to scheduler jobs and services
>>>>> on an Aurora master.
>>>>>
>>>>>
>>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>>> play with. Once you have understood how it works, you can start trying to
>>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>>
>>>>>
>>>>> Hope this helps a little,
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> *From:* Krish <kr...@gmail.com>
>>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>>> *To:* Bill Farner
>>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>>
>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>
>>>>> Bill/Stephen,
>>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>>
>>>>> I do not know what to specify for  -framework_authentication_file
>>>>> & -zk_digest_credentials, and they are required arguments.
>>>>>
>>>>> I am not using any authentication on Mesos master, do I still need the
>>>>> framework_authentication_file parameter?
>>>>>
>>>>>
>>>>> rm -rf /db /backup_dir
>>>>> mesos-log initialize --path="/db"
>>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>>> -native_log_file_path=/db
>>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>>> ...
>>>>> ...
>>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>>> GuiceManagedCompon
>>>>> entProvider with the scope "PerRequest"
>>>>> Oct 20, 2015 9:27:40 AM
>>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>>> deTimeZone
>>>>> WARNING: Cron schedules are configured to fire according to timezone
>>>>> Greenwich M
>>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>>> doStart
>>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>>> E1020 09:27:41.290 THREAD1
>>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>>> ute: Caught unchecked exception: com.google.inject.ProvisionException:
>>>>> Guice pro
>>>>> vision errors:
>>>>>
>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>>>> cannot be null at
>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>> LogStreamModule.java:117)
>>>>>   at
>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>> LogStreamModule.java:117)
>>>>>   while locating org.apache.mesos.Log
>>>>>   at
>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>> ace(MesosLogStreamModule.java:152)
>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>
>>>>> 1 error
>>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>>
>>>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>>>> cannot be
>>>>> null
>>>>>   at
>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>> LogStreamModule.java:117)
>>>>>   at
>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>>> LogStreamModule.java:117)
>>>>>   while locating org.apache.mesos.Log
>>>>>   at
>>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>>> ace(MesosLogStreamModule.java:152)
>>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>>
>>>>> 1 error
>>>>>         at
>>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>>         at
>>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> The typical flow is that you keep your .aurora file checked into git,
>>>>>> and commit every time you deploy/update.  When you change your file, you
>>>>>> will instruct Aurora to update the live job (have a look at aurora
>>>>>> update -h).  Aurora will perform a rolling upgrade of your job to
>>>>>> the new config.  You'll use this same flow for updating your job's software
>>>>>> as well as resizing the job.
>>>>>>
>>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>>> exports.  Have a look here for monitoring background:
>>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>>
>>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>>
>>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler
>>>>>>> script has the --thermos_executor_path as a mandatory requirement.
>>>>>>>
>>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>>> config file functions:
>>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>>> 2. What happens when we want to dynamically change the config, say
>>>>>>> increasing the number of instances of a service required? Does aurora
>>>>>>> require a reboot then?
>>>>>>> 3. How do I get notified about the message mesos sends when it
>>>>>>> cannot schedule tasks for lack of resources? Should I depend on aurora for
>>>>>>> this or try to look for a hook into mesos?
>>>>>>>
>>>>>>> I think a little bit of context would help here.
>>>>>>> What I plan to check is to run a very basic job/task inside a docker
>>>>>>> container with aurora & wait for a 'resource not available' message from
>>>>>>> mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>>
>>>>>>>> I believe you are missing the thermos_executor options that have to
>>>>>>>> be passed to the scheduler command line.
>>>>>>>>
>>>>>>>>
>>>>>>>> See
>>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>>> for an example
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>>
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>>> *To:* user@aurora.apache.org
>>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I am a n00b with apache aurora & trying to experiment some things
>>>>>>>> on my local machine with zookeeper and mesos-master running locally. They
>>>>>>>> have initialized properly. When I try to run aurora with the required
>>>>>>>> options, I get the following error, & googing hasn't helped me much here.
>>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>>
>>>>>>>> ...
>>>>>>>> ...
>>>>>>>> WARNING: Method [public void
>>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>>> is synthetic and is being intercepted by
>>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>>>>>>>> indicate a bug.  The method
>>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>>> Guice creation errors:
>>>>>>>>
>>>>>>>> 1) An exception was caught and reported. Message: A value may only
>>>>>>>> be retrieved from a variable that has a default or has been
>>>>>>>> set.
>>>>>>>>   at
>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>
>>>>>>>> 2) Could not find a suitable constructor in
>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>>> either one (a
>>>>>>>> nd only one) constructor annotated with @Inject or a zero-argument
>>>>>>>> constructor that is not private.
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>>   at
>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>>
>>>>>>>> 2 errors
>>>>>>>>         at
>>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>>         at
>>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>>         at
>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>>         at
>>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>>         at
>>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>>         at
>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>>         at
>>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>>         at
>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>>         at
>>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>>         at
>>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>>         at
>>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>>         at
>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>         at
>>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>>         at
>>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>>         at
>>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>>         at
>>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>>         at
>>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>>         ... 7 more
>>>>>>>>
>>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> --
>>>>>>>>> κρισhναν
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Zameer Manji
>

Re: Stacktrace when running Apache Aurora

Posted by Zameer Manji <zm...@apache.org>.

Version 0.9.0 does not have the gc executor. Version 0.9.0 uses Mesos' task
reconciliation
<http://mesos.apache.org/documentation/latest/reconciliation/> API instead.

On Wed, Oct 21, 2015 at 9:28 AM, Krish <kr...@gmail.com> wrote:

> Thanks Bill for the location to the debs. I was finally able to run
> aurora. :)
>
> I did find thermos_executor.pex & thermos_observer after installing
> aurora-executor. I still could not find gc_executor.pex on my system.
> Is there a location from where I can download the binaries for *.pex or
> build them from scratch?
>
> root@dev:/# find . -name "*.pex"
> ./usr/share/aurora/bin/thermos_executor.pex
> ./usr/share/aurora/bin/kaurora_admin.pex
> ./usr/share/aurora/bin/kaurora.pex
> ./usr/share/aurora/bin/thermos.pex
> ./usr/share/aurora/bin/thermos_observer.pex
> ./home/ubuntu/.pex
> ./root/.pex
>
>
>
>
>
> --
> κρισhναν
>
> On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org> wrote:
>
>> Aurora currently requires an executor, so setting it to /dev/null will
>> not work.  Happy to talk further about your thoughts around sidestepping
>> the executor.
>>
>> As for working with the scheduler source code, it's a standard gradle
>> project and we tend to use intellij.  Docs to help ramp on that:
>> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>>
>> As for builds - the .zip is a source distribution, so it won't have any
>> pre-built binaries.  If you're on debian, we have official debs here:
>> https://bintray.com/apache/aurora
>> You can see how they're built here (and can build your own) packages:
>> https://github.com/apache/aurora-packaging
>> We're close to having official RPMs, but none to speak of yet.
>>
>> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com> wrote:
>>
>>> Stephen,
>>> I am trying to get started and run aurora without thermos executor
>>> (setting it to /dev/null does not help) - on a local linux box for now &
>>> planning to containerize/dockerize it later.
>>>
>>> Can you please point me to the right documentation (or a pointer to the
>>> cli parsing source code) which can help me resolve this? Also, are there
>>> any steps steps to import source code into eclipse to browse & analyze code
>>> for this.
>>>
>>> Also, where do i find all the *.pex files? They are not present in the
>>> zip file nor anywhere in the built source code.
>>>
>>> I know I am asking too many queries on a single thread here, & would
>>> appreciate the help.
>>> I think at the end of this, I will put the steps I followed in a
>>> gist/blog so others might find their way around, & not struggle as much.
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>>> Stephan.Erb@blue-yonder.com> wrote:
>>>
>>>> Hi Krish,
>>>>
>>>>
>>>> you don't have to set framework_authentication_file and
>>>> zk_digest_credentials. The scheduler help text is misleading here as
>>>> everything will work fine if you leave those empty.
>>>>
>>>>
>>>> In addition, looks like you are misunderstanding the usage of the
>>>> thermos_executor_path command line flag of the scheduler. It is
>>>> supposed to point to the binary containing the generic Aurora executor
>>>> (thermos_executor.pex).  You only need the hello_world.aurora once
>>>> your scheduler is up an running. It serves as an example input for the
>>>> aurora command line client which can be used to scheduler jobs and services
>>>> on an Aurora master.
>>>>
>>>>
>>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>>> checkout of the Aurora source code. It gives you a running scheduler to
>>>> play with. Once you have understood how it works, you can start trying to
>>>> install it on your own (by reverse-engineering the vagrant box).
>>>>
>>>>
>>>> Hope this helps a little,
>>>>
>>>> Stephan
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Krish <kr...@gmail.com>
>>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>>> *To:* Bill Farner
>>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>>
>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>
>>>> Bill/Stephen,
>>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>>
>>>> I do not know what to specify for  -framework_authentication_file
>>>> & -zk_digest_credentials, and they are required arguments.
>>>>
>>>> I am not using any authentication on Mesos master, do I still need the
>>>> framework_authentication_file parameter?
>>>>
>>>>
>>>> rm -rf /db /backup_dir
>>>> mesos-log initialize --path="/db"
>>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>>> -native_log_file_path=/db
>>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>>> ...
>>>> ...
>>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>>> GuiceManagedCompon
>>>> entProvider with the scope "PerRequest"
>>>> Oct 20, 2015 9:27:40 AM
>>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>>> deTimeZone
>>>> WARNING: Cron schedules are configured to fire according to timezone
>>>> Greenwich M
>>>> ean Time but system timezone is set to Coordinated Universal Time
>>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>>> doStart
>>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>>> E1020 09:27:41.290 THREAD1
>>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>>> ute: Caught unchecked exception: com.google.inject.ProvisionException:
>>>> Guice pro
>>>> vision errors:
>>>>
>>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>>> cannot be null at
>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>> LogStreamModule.java:117)
>>>>   at
>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>> LogStreamModule.java:117)
>>>>   while locating org.apache.mesos.Log
>>>>   at
>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>> ace(MesosLogStreamModule.java:152)
>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>
>>>> 1 error
>>>> com.google.inject.ProvisionException: Guice provision errors:
>>>>
>>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>>> cannot be
>>>> null
>>>>   at
>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>> LogStreamModule.java:117)
>>>>   at
>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>>> LogStreamModule.java:117)
>>>>   while locating org.apache.mesos.Log
>>>>   at
>>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>>> ace(MesosLogStreamModule.java:152)
>>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>>
>>>> 1 error
>>>>         at
>>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>>         at
>>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>
>>>> wrote:
>>>>
>>>>> The typical flow is that you keep your .aurora file checked into git,
>>>>> and commit every time you deploy/update.  When you change your file, you
>>>>> will instruct Aurora to update the live job (have a look at aurora
>>>>> update -h).  Aurora will perform a rolling upgrade of your job to the
>>>>> new config.  You'll use this same flow for updating your job's software as
>>>>> well as resizing the job.
>>>>>
>>>>> For (3), you could set up alerting for stats that the scheduler
>>>>> exports.  Have a look here for monitoring background:
>>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>>
>>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>>
>>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler script
>>>>>> has the --thermos_executor_path as a mandatory requirement.
>>>>>>
>>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>>> config file functions:
>>>>>> 1. Do we have to statically define the file beforehand?
>>>>>> 2. What happens when we want to dynamically change the config, say
>>>>>> increasing the number of instances of a service required? Does aurora
>>>>>> require a reboot then?
>>>>>> 3. How do I get notified about the message mesos sends when it cannot
>>>>>> schedule tasks for lack of resources? Should I depend on aurora for this or
>>>>>> try to look for a hook into mesos?
>>>>>>
>>>>>> I think a little bit of context would help here.
>>>>>> What I plan to check is to run a very basic job/task inside a docker
>>>>>> container with aurora & wait for a 'resource not available' message from
>>>>>> mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>>
>>>>>>> I believe you are missing the thermos_executor options that have to
>>>>>>> be passed to the scheduler command line.
>>>>>>>
>>>>>>>
>>>>>>> See
>>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>>> for an example
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>>
>>>>>>> Stephan
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>>> *To:* user@aurora.apache.org
>>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>>
>>>>>>> Hi,
>>>>>>> I am a n00b with apache aurora & trying to experiment some things on
>>>>>>> my local machine with zookeeper and mesos-master running locally. They have
>>>>>>> initialized properly. When I try to run aurora with the required options, I
>>>>>>> get the following error, & googing hasn't helped me much here.
>>>>>>> Appreciate any help. Thanks in advance.
>>>>>>>
>>>>>>> ...
>>>>>>> ...
>>>>>>> WARNING: Method [public void
>>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>>> is synthetic and is being intercepted by
>>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>>>>>>> indicate a bug.  The method
>>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>>> Exception in thread "main" com.google.inject.CreationException:
>>>>>>> Guice creation errors:
>>>>>>>
>>>>>>> 1) An exception was caught and reported. Message: A value may only
>>>>>>> be retrieved from a variable that has a default or has been
>>>>>>> set.
>>>>>>>   at
>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>
>>>>>>> 2) Could not find a suitable constructor in
>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>>> either one (a
>>>>>>> nd only one) constructor annotated with @Inject or a zero-argument
>>>>>>> constructor that is not private.
>>>>>>>   at
>>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>>   at
>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>>
>>>>>>> 2 errors
>>>>>>>         at
>>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>>         at
>>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>>         at
>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>>         at
>>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>>         at
>>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>>         at
>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>>         at
>>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>>         at
>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>>         at
>>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>>         at
>>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>>         at
>>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>>         at
>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>         at
>>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>>         at
>>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>>         at
>>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>>         at
>>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>>         at
>>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>>         ... 7 more
>>>>>>>
>>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> --
>>>>>>>> κρισhναν
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


-- 
Zameer Manji

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Thanks Bill for the location to the debs. I was finally able to run aurora.
:)

I did find thermos_executor.pex & thermos_observer after installing
aurora-executor. I still could not find gc_executor.pex on my system.
Is there a location from where I can download the binaries for *.pex or
build them from scratch?

root@dev:/# find . -name "*.pex"
./usr/share/aurora/bin/thermos_executor.pex
./usr/share/aurora/bin/kaurora_admin.pex
./usr/share/aurora/bin/kaurora.pex
./usr/share/aurora/bin/thermos.pex
./usr/share/aurora/bin/thermos_observer.pex
./home/ubuntu/.pex
./root/.pex





--
κρισhναν

On Tue, Oct 20, 2015 at 11:46 PM, Bill Farner <wf...@apache.org> wrote:

> Aurora currently requires an executor, so setting it to /dev/null will not
> work.  Happy to talk further about your thoughts around sidestepping the
> executor.
>
> As for working with the scheduler source code, it's a standard gradle
> project and we tend to use intellij.  Docs to help ramp on that:
> https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md
>
> As for builds - the .zip is a source distribution, so it won't have any
> pre-built binaries.  If you're on debian, we have official debs here:
> https://bintray.com/apache/aurora
> You can see how they're built here (and can build your own) packages:
> https://github.com/apache/aurora-packaging
> We're close to having official RPMs, but none to speak of yet.
>
> On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com> wrote:
>
>> Stephen,
>> I am trying to get started and run aurora without thermos executor
>> (setting it to /dev/null does not help) - on a local linux box for now &
>> planning to containerize/dockerize it later.
>>
>> Can you please point me to the right documentation (or a pointer to the
>> cli parsing source code) which can help me resolve this? Also, are there
>> any steps steps to import source code into eclipse to browse & analyze code
>> for this.
>>
>> Also, where do i find all the *.pex files? They are not present in the
>> zip file nor anywhere in the built source code.
>>
>> I know I am asking too many queries on a single thread here, & would
>> appreciate the help.
>> I think at the end of this, I will put the steps I followed in a
>> gist/blog so others might find their way around, & not struggle as much.
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <
>> Stephan.Erb@blue-yonder.com> wrote:
>>
>>> Hi Krish,
>>>
>>>
>>> you don't have to set framework_authentication_file and
>>> zk_digest_credentials. The scheduler help text is misleading here as
>>> everything will work fine if you leave those empty.
>>>
>>>
>>> In addition, looks like you are misunderstanding the usage of the
>>> thermos_executor_path command line flag of the scheduler. It is
>>> supposed to point to the binary containing the generic Aurora executor
>>> (thermos_executor.pex).  You only need the hello_world.aurora once your
>>> scheduler is up an running. It serves as an example input for the aurora
>>> command line client which can be used to scheduler jobs and services on an
>>> Aurora master.
>>>
>>>
>>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>>> checkout of the Aurora source code. It gives you a running scheduler to
>>> play with. Once you have understood how it works, you can start trying to
>>> install it on your own (by reverse-engineering the vagrant box).
>>>
>>>
>>> Hope this helps a little,
>>>
>>> Stephan
>>>
>>>
>>>
>>> ------------------------------
>>> *From:* Krish <kr...@gmail.com>
>>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>>> *To:* Bill Farner
>>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>>
>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>
>>> Bill/Stephen,
>>> I still get a stacktrace when running the aurora scheduler CLI.
>>>
>>> I do not know what to specify for  -framework_authentication_file
>>> & -zk_digest_credentials, and they are required arguments.
>>>
>>> I am not using any authentication on Mesos master, do I still need the
>>> framework_authentication_file parameter?
>>>
>>>
>>> rm -rf /db /backup_dir
>>> mesos-log initialize --path="/db"
>>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>>> -native_log_file_path=/db
>>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>>> ...
>>> ...
>>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>>> GuiceManagedCompon
>>> entProvider with the scope "PerRequest"
>>> Oct 20, 2015 9:27:40 AM
>>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>>> deTimeZone
>>> WARNING: Cron schedules are configured to fire according to timezone
>>> Greenwich M
>>> ean Time but system timezone is set to Coordinated Universal Time
>>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector
>>> doStart
>>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>>> E1020 09:27:41.290 THREAD1
>>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>>> ute: Caught unchecked exception: com.google.inject.ProvisionException:
>>> Guice pro
>>> vision errors:
>>>
>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>> cannot be null at
>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>> LogStreamModule.java:117)
>>>   at
>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>> LogStreamModule.java:117)
>>>   while locating org.apache.mesos.Log
>>>   at
>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>> ace(MesosLogStreamModule.java:152)
>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>
>>> 1 error
>>> com.google.inject.ProvisionException: Guice provision errors:
>>>
>>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>>> cannot be
>>> null
>>>   at
>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>> LogStreamModule.java:117)
>>>   at
>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>>> LogStreamModule.java:117)
>>>   while locating org.apache.mesos.Log
>>>   at
>>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>>> ace(MesosLogStreamModule.java:152)
>>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>>
>>> 1 error
>>>         at
>>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>>         at
>>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org> wrote:
>>>
>>>> The typical flow is that you keep your .aurora file checked into git,
>>>> and commit every time you deploy/update.  When you change your file, you
>>>> will instruct Aurora to update the live job (have a look at aurora
>>>> update -h).  Aurora will perform a rolling upgrade of your job to the
>>>> new config.  You'll use this same flow for updating your job's software as
>>>> well as resizing the job.
>>>>
>>>> For (3), you could set up alerting for stats that the scheduler
>>>> exports.  Have a look here for monitoring background:
>>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>>
>>>> You'll find want to look at scheduler stats related to 'pending'.
>>>>
>>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for the pointer. Now I notice that the aurora-scheduler script
>>>>> has the --thermos_executor_path as a mandatory requirement.
>>>>>
>>>>> I have a couple of questions on how the thermos_executor/.aurora
>>>>> config file functions:
>>>>> 1. Do we have to statically define the file beforehand?
>>>>> 2. What happens when we want to dynamically change the config, say
>>>>> increasing the number of instances of a service required? Does aurora
>>>>> require a reboot then?
>>>>> 3. How do I get notified about the message mesos sends when it cannot
>>>>> schedule tasks for lack of resources? Should I depend on aurora for this or
>>>>> try to look for a hook into mesos?
>>>>>
>>>>> I think a little bit of context would help here.
>>>>> What I plan to check is to run a very basic job/task inside a docker
>>>>> container with aurora & wait for a 'resource not available' message from
>>>>> mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>>
>>>>>> I believe you are missing the thermos_executor options that have to
>>>>>> be passed to the scheduler command line.
>>>>>>
>>>>>>
>>>>>> See
>>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>>> for an example
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>>
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* Krish <kr...@gmail.com>
>>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>>> *To:* user@aurora.apache.org
>>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>>
>>>>>> Hi,
>>>>>> I am a n00b with apache aurora & trying to experiment some things on
>>>>>> my local machine with zookeeper and mesos-master running locally. They have
>>>>>> initialized properly. When I try to run aurora with the required options, I
>>>>>> get the following error, & googing hasn't helped me much here.
>>>>>> Appreciate any help. Thanks in advance.
>>>>>>
>>>>>> ...
>>>>>> ...
>>>>>> WARNING: Method [public void
>>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>>> is synthetic and is being intercepted by
>>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>>>>>> indicate a bug.  The method
>>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>>> Exception in thread "main" com.google.inject.CreationException: Guice
>>>>>> creation errors:
>>>>>>
>>>>>> 1) An exception was caught and reported. Message: A value may only be
>>>>>> retrieved from a variable that has a default or has been
>>>>>> set.
>>>>>>   at
>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>
>>>>>> 2) Could not find a suitable constructor in
>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>>> either one (a
>>>>>> nd only one) constructor annotated with @Inject or a zero-argument
>>>>>> constructor that is not private.
>>>>>>   at
>>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>>   at
>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>>
>>>>>> 2 errors
>>>>>>         at
>>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>>         at
>>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>>         at
>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>>         at
>>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>>         at
>>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>>         at
>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>>         at
>>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>>         at
>>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>>> retrieved from a variable that has a default or has been set.
>>>>>>         at
>>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>>         at
>>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>>         at
>>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>>         at
>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>         at
>>>>>> com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>>         at
>>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>>         at
>>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>>         at
>>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>>         at
>>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>>         ... 7 more
>>>>>>
>>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> --
>>>>>>> κρισhναν
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Bill Farner <wf...@apache.org>.

Aurora currently requires an executor, so setting it to /dev/null will not
work.  Happy to talk further about your thoughts around sidestepping the
executor.

As for working with the scheduler source code, it's a standard gradle
project and we tend to use intellij.  Docs to help ramp on that:
https://github.com/apache/aurora/blob/master/docs/developing-aurora-scheduler.md

As for builds - the .zip is a source distribution, so it won't have any
pre-built binaries.  If you're on debian, we have official debs here:
https://bintray.com/apache/aurora
You can see how they're built here (and can build your own) packages:
https://github.com/apache/aurora-packaging
We're close to having official RPMs, but none to speak of yet.

On Tue, Oct 20, 2015 at 9:47 AM, Krish <kr...@gmail.com> wrote:

> Stephen,
> I am trying to get started and run aurora without thermos executor
> (setting it to /dev/null does not help) - on a local linux box for now &
> planning to containerize/dockerize it later.
>
> Can you please point me to the right documentation (or a pointer to the
> cli parsing source code) which can help me resolve this? Also, are there
> any steps steps to import source code into eclipse to browse & analyze code
> for this.
>
> Also, where do i find all the *.pex files? They are not present in the zip
> file nor anywhere in the built source code.
>
> I know I am asking too many queries on a single thread here, & would
> appreciate the help.
> I think at the end of this, I will put the steps I followed in a gist/blog
> so others might find their way around, & not struggle as much.
>
>
>
> --
> κρισhναν
>
> On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <Stephan.Erb@blue-yonder.com
> > wrote:
>
>> Hi Krish,
>>
>>
>> you don't have to set framework_authentication_file and
>> zk_digest_credentials. The scheduler help text is misleading here as
>> everything will work fine if you leave those empty.
>>
>>
>> In addition, looks like you are misunderstanding the usage of the
>> thermos_executor_path command line flag of the scheduler. It is supposed
>> to point to the binary containing the generic Aurora executor
>> (thermos_executor.pex).  You only need the hello_world.aurora once your
>> scheduler is up an running. It serves as an example input for the aurora
>> command line client which can be used to scheduler jobs and services on an
>> Aurora master.
>>
>>
>> Have you tried to use the vagrant box? Just type 'vagrant up`in a
>> checkout of the Aurora source code. It gives you a running scheduler to
>> play with. Once you have understood how it works, you can start trying to
>> install it on your own (by reverse-engineering the vagrant box).
>>
>>
>> Hope this helps a little,
>>
>> Stephan
>>
>>
>>
>> ------------------------------
>> *From:* Krish <kr...@gmail.com>
>> *Sent:* Tuesday, October 20, 2015 11:39 AM
>> *To:* Bill Farner
>> *Cc:* user@aurora.apache.org; Erb, Stephan
>>
>> *Subject:* Re: Stacktrace when running Apache Aurora
>>
>> Bill/Stephen,
>> I still get a stacktrace when running the aurora scheduler CLI.
>>
>> I do not know what to specify for  -framework_authentication_file
>> & -zk_digest_credentials, and they are required arguments.
>>
>> I am not using any authentication on Mesos master, do I still need the
>> framework_authentication_file parameter?
>>
>>
>> rm -rf /db /backup_dir
>> mesos-log initialize --path="/db"
>> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
>> JAVA_OPTS="-Xmx1536m  -Xms256m"
>> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
>> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
>> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
>> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
>> -native_log_file_path=/db
>> -thermos_executor_path=/home/ubuntu/hello_world.aurora
>> ...
>> ...
>> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
>> GuiceManagedCompon
>> entProvider with the scope "PerRequest"
>> Oct 20, 2015 9:27:40 AM
>> org.apache.aurora.scheduler.cron.quartz.CronModule provi
>> deTimeZone
>> WARNING: Cron schedules are configured to fire according to timezone
>> Greenwich M
>> ean Time but system timezone is set to Coordinated Universal Time
>> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector doStart
>> INFO: Started SelectChannelConnector@0.0.0.0:43843
>> E1020 09:27:41.290 THREAD1
>> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
>> ute: Caught unchecked exception: com.google.inject.ProvisionException:
>> Guice pro
>> vision errors:
>>
>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>> cannot be null at
>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>> LogStreamModule.java:117)
>>   at
>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>> LogStreamModule.java:117)
>>   while locating org.apache.mesos.Log
>>   at
>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>> ace(MesosLogStreamModule.java:152)
>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>
>> 1 error
>> com.google.inject.ProvisionException: Guice provision errors:
>>
>> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
>> cannot be
>> null
>>   at
>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>> LogStreamModule.java:117)
>>   at
>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
>> LogStreamModule.java:117)
>>   while locating org.apache.mesos.Log
>>   at
>> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
>> ace(MesosLogStreamModule.java:152)
>>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>>
>> 1 error
>>         at
>> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>>         at
>> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org> wrote:
>>
>>> The typical flow is that you keep your .aurora file checked into git,
>>> and commit every time you deploy/update.  When you change your file, you
>>> will instruct Aurora to update the live job (have a look at aurora
>>> update -h).  Aurora will perform a rolling upgrade of your job to the
>>> new config.  You'll use this same flow for updating your job's software as
>>> well as resizing the job.
>>>
>>> For (3), you could set up alerting for stats that the scheduler
>>> exports.  Have a look here for monitoring background:
>>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>>
>>> You'll find want to look at scheduler stats related to 'pending'.
>>>
>>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>>> wrote:
>>>
>>>> Thanks for the pointer. Now I notice that the aurora-scheduler script
>>>> has the --thermos_executor_path as a mandatory requirement.
>>>>
>>>> I have a couple of questions on how the thermos_executor/.aurora config
>>>> file functions:
>>>> 1. Do we have to statically define the file beforehand?
>>>> 2. What happens when we want to dynamically change the config, say
>>>> increasing the number of instances of a service required? Does aurora
>>>> require a reboot then?
>>>> 3. How do I get notified about the message mesos sends when it cannot
>>>> schedule tasks for lack of resources? Should I depend on aurora for this or
>>>> try to look for a hook into mesos?
>>>>
>>>> I think a little bit of context would help here.
>>>> What I plan to check is to run a very basic job/task inside a docker
>>>> container with aurora & wait for a 'resource not available' message from
>>>> mesos, and accordingly call an api to spin up a new node in my cluster.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> κρισhναν
>>>>
>>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>>> Stephan.Erb@blue-yonder.com> wrote:
>>>>
>>>>> I believe you are missing the thermos_executor options that have to be
>>>>> passed to the scheduler command line.
>>>>>
>>>>>
>>>>> See
>>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>>> for an example
>>>>>
>>>>>
>>>>> Best Regards,
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>> ------------------------------
>>>>> *From:* Krish <kr...@gmail.com>
>>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>>> *To:* user@aurora.apache.org
>>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>>
>>>>> Hi,
>>>>> I am a n00b with apache aurora & trying to experiment some things on
>>>>> my local machine with zookeeper and mesos-master running locally. They have
>>>>> initialized properly. When I try to run aurora with the required options, I
>>>>> get the following error, & googing hasn't helped me much here.
>>>>> Appreciate any help. Thanks in advance.
>>>>>
>>>>> ...
>>>>> ...
>>>>> WARNING: Method [public void
>>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>>> is synthetic and is being intercepted by
>>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>>>>> indicate a bug.  The method
>>>>>  may be intercepted twice, or may not be intercepted at all.
>>>>> Exception in thread "main" com.google.inject.CreationException: Guice
>>>>> creation errors:
>>>>>
>>>>> 1) An exception was caught and reported. Message: A value may only be
>>>>> retrieved from a variable that has a default or has been
>>>>> set.
>>>>>   at
>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>
>>>>> 2) Could not find a suitable constructor in
>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>>> either one (a
>>>>> nd only one) constructor annotated with @Inject or a zero-argument
>>>>> constructor that is not private.
>>>>>   at
>>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>>   at
>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>>
>>>>> 2 errors
>>>>>         at
>>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>>         at
>>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>>         at
>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>>         at
>>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>>         at
>>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>>         at
>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>>         at
>>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>>         at
>>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>>> retrieved from a variable that has a default or has been set.
>>>>>         at
>>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>>         at
>>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>>         at
>>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>>         at
>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>         at com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>>         at
>>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>>         at
>>>>> com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>>         at
>>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>>         at
>>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>>         ... 7 more
>>>>>
>>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>>
>>>>>
>>>>>
>>>>>> --
>>>>>> κρισhναν
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Stephen,
I am trying to get started and run aurora without thermos executor (setting
it to /dev/null does not help) - on a local linux box for now & planning to
containerize/dockerize it later.

Can you please point me to the right documentation (or a pointer to the cli
parsing source code) which can help me resolve this? Also, are there any
steps steps to import source code into eclipse to browse & analyze code for
this.

Also, where do i find all the *.pex files? They are not present in the zip
file nor anywhere in the built source code.

I know I am asking too many queries on a single thread here, & would
appreciate the help.
I think at the end of this, I will put the steps I followed in a gist/blog
so others might find their way around, & not struggle as much.



--
κρισhναν

On Tue, Oct 20, 2015 at 4:09 PM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> Hi Krish,
>
>
> you don't have to set framework_authentication_file and
> zk_digest_credentials. The scheduler help text is misleading here as
> everything will work fine if you leave those empty.
>
>
> In addition, looks like you are misunderstanding the usage of the
> thermos_executor_path command line flag of the scheduler. It is supposed
> to point to the binary containing the generic Aurora executor
> (thermos_executor.pex).  You only need the hello_world.aurora once your
> scheduler is up an running. It serves as an example input for the aurora
> command line client which can be used to scheduler jobs and services on an
> Aurora master.
>
>
> Have you tried to use the vagrant box? Just type 'vagrant up`in a checkout
> of the Aurora source code. It gives you a running scheduler to play with.
> Once you have understood how it works, you can start trying to install it
> on your own (by reverse-engineering the vagrant box).
>
>
> Hope this helps a little,
>
> Stephan
>
>
>
> ------------------------------
> *From:* Krish <kr...@gmail.com>
> *Sent:* Tuesday, October 20, 2015 11:39 AM
> *To:* Bill Farner
> *Cc:* user@aurora.apache.org; Erb, Stephan
>
> *Subject:* Re: Stacktrace when running Apache Aurora
>
> Bill/Stephen,
> I still get a stacktrace when running the aurora scheduler CLI.
>
> I do not know what to specify for  -framework_authentication_file
> & -zk_digest_credentials, and they are required arguments.
>
> I am not using any authentication on Mesos master, do I still need the
> framework_authentication_file parameter?
>
>
> rm -rf /db /backup_dir
> mesos-log initialize --path="/db"
> export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
> JAVA_OPTS="-Xmx1536m  -Xms256m"
> /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
> -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
> -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
> -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
> -native_log_file_path=/db
> -thermos_executor_path=/home/ubuntu/hello_world.aurora
> ...
> ...
> INFO: Binding org.apache.aurora.scheduler.http.Utilization to
> GuiceManagedCompon
> entProvider with the scope "PerRequest"
> Oct 20, 2015 9:27:40 AM org.apache.aurora.scheduler.cron.quartz.CronModule
> provi
> deTimeZone
> WARNING: Cron schedules are configured to fire according to timezone
> Greenwich M
> ean Time but system timezone is set to Coordinated Universal Time
> Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector doStart
> INFO: Started SelectChannelConnector@0.0.0.0:43843
> E1020 09:27:41.290 THREAD1
> org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
> ute: Caught unchecked exception: com.google.inject.ProvisionException:
> Guice pro
> vision errors:
>
> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
> cannot be null at
> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
> LogStreamModule.java:117)
>   at
> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
> LogStreamModule.java:117)
>   while locating org.apache.mesos.Log
>   at
> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
> ace(MesosLogStreamModule.java:152)
>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>
> 1 error
> com.google.inject.ProvisionException: Guice provision errors:
>
> 1) Error in custom provider, java.lang.IllegalArgumentException: Path
> cannot be
> null
>   at
> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
> LogStreamModule.java:117)
>   at
> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
> LogStreamModule.java:117)
>   while locating org.apache.mesos.Log
>   at
> org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
> ace(MesosLogStreamModule.java:152)
>   while locating org.apache.aurora.scheduler.log.mesos.LogInterface
>
> 1 error
>         at
> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
>         at
> org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136
>
>
>
> --
> κρισhναν
>
> On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org> wrote:
>
>> The typical flow is that you keep your .aurora file checked into git, and
>> commit every time you deploy/update.  When you change your file, you will
>> instruct Aurora to update the live job (have a look at aurora update -h).
>> Aurora will perform a rolling upgrade of your job to the new config.
>> You'll use this same flow for updating your job's software as well as
>> resizing the job.
>>
>> For (3), you could set up alerting for stats that the scheduler exports.
>> Have a look here for monitoring background:
>> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>>
>> You'll find want to look at scheduler stats related to 'pending'.
>>
>> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>
>> wrote:
>>
>>> Thanks for the pointer. Now I notice that the aurora-scheduler script
>>> has the --thermos_executor_path as a mandatory requirement.
>>>
>>> I have a couple of questions on how the thermos_executor/.aurora config
>>> file functions:
>>> 1. Do we have to statically define the file beforehand?
>>> 2. What happens when we want to dynamically change the config, say
>>> increasing the number of instances of a service required? Does aurora
>>> require a reboot then?
>>> 3. How do I get notified about the message mesos sends when it cannot
>>> schedule tasks for lack of resources? Should I depend on aurora for this or
>>> try to look for a hook into mesos?
>>>
>>> I think a little bit of context would help here.
>>> What I plan to check is to run a very basic job/task inside a docker
>>> container with aurora & wait for a 'resource not available' message from
>>> mesos, and accordingly call an api to spin up a new node in my cluster.
>>>
>>>
>>>
>>>
>>> --
>>> κρισhναν
>>>
>>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>>> Stephan.Erb@blue-yonder.com> wrote:
>>>
>>>> I believe you are missing the thermos_executor options that have to be
>>>> passed to the scheduler command line.
>>>>
>>>>
>>>> See
>>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>>> for an example
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Stephan
>>>>
>>>>
>>>> ------------------------------
>>>> *From:* Krish <kr...@gmail.com>
>>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>>> *To:* user@aurora.apache.org
>>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>>
>>>> Hi,
>>>> I am a n00b with apache aurora & trying to experiment some things on my
>>>> local machine with zookeeper and mesos-master running locally. They have
>>>> initialized properly. When I try to run aurora with the required options, I
>>>> get the following error, & googing hasn't helped me much here.
>>>> Appreciate any help. Thanks in advance.
>>>>
>>>> ...
>>>> ...
>>>> WARNING: Method [public void
>>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>>> is synthetic and is being intercepted by
>>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>>>> indicate a bug.  The method
>>>>  may be intercepted twice, or may not be intercepted at all.
>>>> Exception in thread "main" com.google.inject.CreationException: Guice
>>>> creation errors:
>>>>
>>>> 1) An exception was caught and reported. Message: A value may only be
>>>> retrieved from a variable that has a default or has been
>>>> set.
>>>>   at
>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>
>>>> 2) Could not find a suitable constructor in
>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>>> either one (a
>>>> nd only one) constructor annotated with @Inject or a zero-argument
>>>> constructor that is not private.
>>>>   at
>>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>>   at
>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>>
>>>> 2 errors
>>>>         at
>>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>>         at
>>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>>         at
>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>>         at
>>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>>         at
>>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>>         at
>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>>         at
>>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>>         at
>>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>>> Caused by: java.lang.IllegalStateException: A value may only be
>>>> retrieved from a variable that has a default or has been set.
>>>>         at
>>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>>         at
>>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>>         at
>>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>>         at
>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>         at com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>>         at
>>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>>         at com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>>         at
>>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>>         at
>>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>>         ... 7 more
>>>>
>>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>>
>>>>
>>>>
>>>>> --
>>>>> κρισhναν
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by "Erb, Stephan" <St...@blue-yonder.com>.

Hi Krish,


you don't have to set framework_authentication_file and zk_digest_credentials. The scheduler help text is misleading here as everything will work fine if you leave those empty.


In addition, looks like you are misunderstanding the usage of the thermos_executor_path command line flag of the scheduler. It is supposed to point to the binary containing the generic Aurora executor (thermos_executor.pex).  You only need the hello_world.aurora once your scheduler is up an running. It serves as an example input for the aurora command line client which can be used to scheduler jobs and services on an Aurora master.


Have you tried to use the vagrant box? Just type 'vagrant up`in a checkout of the Aurora source code. It gives you a running scheduler to play with. Once you have understood how it works, you can start trying to install it on your own (by reverse-engineering the vagrant box).


Hope this helps a little,

Stephan



________________________________
From: Krish <kr...@gmail.com>
Sent: Tuesday, October 20, 2015 11:39 AM
To: Bill Farner
Cc: user@aurora.apache.org; Erb, Stephan
Subject: Re: Stacktrace when running Apache Aurora

Bill/Stephen,
I still get a stacktrace when running the aurora scheduler CLI.

I do not know what to specify for  -framework_authentication_file & -zk_digest_credentials, and they are required arguments.

I am not using any authentication on Mesos master, do I still need the framework_authentication_file parameter?


rm -rf /db /backup_dir
mesos-log initialize --path="/db"
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
JAVA_OPTS="-Xmx1536m  -Xms256m" /usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir -cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master -serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181 -native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false -native_log_file_path=/db -thermos_executor_path=/home/ubuntu/hello_world.aurora
...
...
INFO: Binding org.apache.aurora.scheduler.http.Utilization to GuiceManagedCompon
entProvider with the scope "PerRequest"
Oct 20, 2015 9:27:40 AM org.apache.aurora.scheduler.cron.quartz.CronModule provi
deTimeZone
WARNING: Cron schedules are configured to fire according to timezone Greenwich M
ean Time but system timezone is set to Coordinated Universal Time
Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector doStart
INFO: Started SelectChannelConnector@0.0.0.0:43843<http://SelectChannelConnector@0.0.0.0:43843>
E1020 09:27:41.290 THREAD1 org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
ute: Caught unchecked exception: com.google.inject.ProvisionException: Guice pro
vision errors:

1) Error in custom provider, java.lang.IllegalArgumentException: Path cannot be null at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  while locating org.apache.mesos.Log
  at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
ace(MesosLogStreamModule.java:152)
  while locating org.apache.aurora.scheduler.log.mesos.LogInterface

1 error
com.google.inject.ProvisionException: Guice provision errors:

1) Error in custom provider, java.lang.IllegalArgumentException: Path cannot be
null
  at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  while locating org.apache.mesos.Log
  at org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
ace(MesosLogStreamModule.java:152)
  while locating org.apache.aurora.scheduler.log.mesos.LogInterface

1 error
        at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
        at org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136



--
κρισhναν

On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org>> wrote:
The typical flow is that you keep your .aurora file checked into git, and commit every time you deploy/update.  When you change your file, you will instruct Aurora to update the live job (have a look at aurora update -h).  Aurora will perform a rolling upgrade of your job to the new config.  You'll use this same flow for updating your job's software as well as resizing the job.

For (3), you could set up alerting for stats that the scheduler exports.  Have a look here for monitoring background: https://github.com/apache/aurora/blob/master/docs/monitoring.md

You'll find want to look at scheduler stats related to 'pending'.

On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com>> wrote:
Thanks for the pointer. Now I notice that the aurora-scheduler script has the --thermos_executor_path as a mandatory requirement.

I have a couple of questions on how the thermos_executor/.aurora config file functions:
1. Do we have to statically define the file beforehand?
2. What happens when we want to dynamically change the config, say increasing the number of instances of a service required? Does aurora require a reboot then?
3. How do I get notified about the message mesos sends when it cannot schedule tasks for lack of resources? Should I depend on aurora for this or try to look for a hook into mesos?

I think a little bit of context would help here.
What I plan to check is to run a very basic job/task inside a docker container with aurora & wait for a 'resource not available' message from mesos, and accordingly call an api to spin up a new node in my cluster.




--
κρισhναν

On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <St...@blue-yonder.com>> wrote:

I believe you are missing the thermos_executor options that have to be passed to the scheduler command line.


See https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39 for an example


Best Regards,

Stephan


________________________________
From: Krish <kr...@gmail.com>>
Sent: Monday, October 19, 2015 8:45 AM
To: user@aurora.apache.org<ma...@aurora.apache.org>
Subject: Re: Stacktrace when running Apache Aurora

Hi,
I am a n00b with apache aurora & trying to experiment some things on my local machine with zookeeper and mesos-master running locally. They have initialized properly. When I try to run aurora with the required options, I get the following error, & googing hasn't helped me much here.
Appreciate any help. Thanks in advance.

...
...
WARNING: Method [public void org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)] is synthetic and is being intercepted by [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could indicate a bug.  The method
 may be intercepted twice, or may not be intercepted at all.
Exception in thread "main" com.google.inject.CreationException: Guice creation errors:

1) An exception was caught and reported. Message: A value may only be retrieved from a variable that has a default or has been
set.
  at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)

2) Could not find a suitable constructor in org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have either one (a
nd only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
  at org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
  at org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)

2 errors
        at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
        at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
        at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
        at com.google.inject.Guice.createInjector(Guice.java:95)
        at com.google.inject.Guice.createInjector(Guice.java:83)
        at com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
        at com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
        at com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
        at com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
        at org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
Caused by: java.lang.IllegalStateException: A value may only be retrieved from a variable that has a default or has been set.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:176)
        at com.twitter.common.args.Arg.get(Arg.java:82)
        at org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
        at com.google.inject.AbstractModule.configure(AbstractModule.java:59)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
        at com.google.inject.util.Modules$2.configure(Modules.java:114)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
        at com.google.inject.spi.Elements.getElements(Elements.java:101)
        at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
        at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
        ... 7 more

Complete logs are present @http://pastebin.com/i72HvbYi.


--
κρισhναν

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Bill/Stephen,
I still get a stacktrace when running the aurora scheduler CLI.

I do not know what to specify for  -framework_authentication_file
& -zk_digest_credentials, and they are required arguments.

I am not using any authentication on Mesos master, do I still need the
framework_authentication_file parameter?


rm -rf /db /backup_dir
mesos-log initialize --path="/db"
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
JAVA_OPTS="-Xmx1536m  -Xms256m"
/usr/local/aurora-scheduler/bin/aurora-scheduler -backup_dir=/backup_dir
-cluster_name=tc -mesos_master_address=zk://localhost:2181/mesos/master
-serverset_path=/scheduler/aurora -zk_endpoints=localhost:2181
-native_log_quorum_size=1 -vlog=SEVERE -logtostderr=false
-native_log_file_path=/db
-thermos_executor_path=/home/ubuntu/hello_world.aurora
...
...
INFO: Binding org.apache.aurora.scheduler.http.Utilization to
GuiceManagedCompon
entProvider with the scope "PerRequest"
Oct 20, 2015 9:27:40 AM org.apache.aurora.scheduler.cron.quartz.CronModule
provi
deTimeZone
WARNING: Cron schedules are configured to fire according to timezone
Greenwich M
ean Time but system timezone is set to Coordinated Universal Time
Oct 20, 2015 9:27:41 AM org.eclipse.jetty.server.AbstractConnector doStart
INFO: Started SelectChannelConnector@0.0.0.0:43843
E1020 09:27:41.290 THREAD1
org.apache.aurora.scheduler.SchedulerLifecycle$9.exec
ute: Caught unchecked exception: com.google.inject.ProvisionException:
Guice pro
vision errors:

1) Error in custom provider, java.lang.IllegalArgumentException: Path
cannot be null at
org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  at
org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  while locating org.apache.mesos.Log
  at
org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
ace(MesosLogStreamModule.java:152)
  while locating org.apache.aurora.scheduler.log.mesos.LogInterface

1 error
com.google.inject.ProvisionException: Guice provision errors:

1) Error in custom provider, java.lang.IllegalArgumentException: Path
cannot be
null
  at
org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  at
org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLog(Mesos
LogStreamModule.java:117)
  while locating org.apache.mesos.Log
  at
org.apache.aurora.scheduler.log.mesos.MesosLogStreamModule.provideLogInterf
ace(MesosLogStreamModule.java:152)
  while locating org.apache.aurora.scheduler.log.mesos.LogInterface

1 error
        at
com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:987)
        at
org.apache.aurora.scheduler.log.mesos.MesosLog.open(MesosLog.java:136



--
κρισhναν

On Tue, Oct 20, 2015 at 6:14 AM, Bill Farner <wf...@apache.org> wrote:

> The typical flow is that you keep your .aurora file checked into git, and
> commit every time you deploy/update.  When you change your file, you will
> instruct Aurora to update the live job (have a look at aurora update -h).
> Aurora will perform a rolling upgrade of your job to the new config.
> You'll use this same flow for updating your job's software as well as
> resizing the job.
>
> For (3), you could set up alerting for stats that the scheduler exports.
> Have a look here for monitoring background:
> https://github.com/apache/aurora/blob/master/docs/monitoring.md
>
> You'll find want to look at scheduler stats related to 'pending'.
>
> On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com> wrote:
>
>> Thanks for the pointer. Now I notice that the aurora-scheduler script has
>> the --thermos_executor_path as a mandatory requirement.
>>
>> I have a couple of questions on how the thermos_executor/.aurora config
>> file functions:
>> 1. Do we have to statically define the file beforehand?
>> 2. What happens when we want to dynamically change the config, say
>> increasing the number of instances of a service required? Does aurora
>> require a reboot then?
>> 3. How do I get notified about the message mesos sends when it cannot
>> schedule tasks for lack of resources? Should I depend on aurora for this or
>> try to look for a hook into mesos?
>>
>> I think a little bit of context would help here.
>> What I plan to check is to run a very basic job/task inside a docker
>> container with aurora & wait for a 'resource not available' message from
>> mesos, and accordingly call an api to spin up a new node in my cluster.
>>
>>
>>
>>
>> --
>> κρισhναν
>>
>> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <
>> Stephan.Erb@blue-yonder.com> wrote:
>>
>>> I believe you are missing the thermos_executor options that have to be
>>> passed to the scheduler command line.
>>>
>>>
>>> See
>>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>>> for an example
>>>
>>>
>>> Best Regards,
>>>
>>> Stephan
>>>
>>>
>>> ------------------------------
>>> *From:* Krish <kr...@gmail.com>
>>> *Sent:* Monday, October 19, 2015 8:45 AM
>>> *To:* user@aurora.apache.org
>>> *Subject:* Re: Stacktrace when running Apache Aurora
>>>
>>> Hi,
>>> I am a n00b with apache aurora & trying to experiment some things on my
>>> local machine with zookeeper and mesos-master running locally. They have
>>> initialized properly. When I try to run aurora with the required options, I
>>> get the following error, & googing hasn't helped me much here.
>>> Appreciate any help. Thanks in advance.
>>>
>>> ...
>>> ...
>>> WARNING: Method [public void
>>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>>> is synthetic and is being intercepted by
>>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>>> indicate a bug.  The method
>>>  may be intercepted twice, or may not be intercepted at all.
>>> Exception in thread "main" com.google.inject.CreationException: Guice
>>> creation errors:
>>>
>>> 1) An exception was caught and reported. Message: A value may only be
>>> retrieved from a variable that has a default or has been
>>> set.
>>>   at
>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>
>>> 2) Could not find a suitable constructor in
>>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>>> either one (a
>>> nd only one) constructor annotated with @Inject or a zero-argument
>>> constructor that is not private.
>>>   at
>>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>>   at
>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>>
>>> 2 errors
>>>         at
>>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>>         at
>>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>>         at
>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>>         at
>>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>>         at
>>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>>         at
>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>>         at
>>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>>         at
>>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>>> Caused by: java.lang.IllegalStateException: A value may only be
>>> retrieved from a variable that has a default or has been set.
>>>         at
>>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>>         at
>>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>>         at
>>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>>         at
>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>         at com.google.inject.util.Modules$2.configure(Modules.java:114)
>>>         at
>>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>>         at com.google.inject.spi.Elements.getElements(Elements.java:101)
>>>         at
>>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>>         at
>>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>>         ... 7 more
>>>
>>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>>
>>>
>>>
>>>> --
>>>> κρισhναν
>>>>
>>>
>>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Bill Farner <wf...@apache.org>.

The typical flow is that you keep your .aurora file checked into git, and
commit every time you deploy/update.  When you change your file, you will
instruct Aurora to update the live job (have a look at aurora update -h).
Aurora will perform a rolling upgrade of your job to the new config.
You'll use this same flow for updating your job's software as well as
resizing the job.

For (3), you could set up alerting for stats that the scheduler exports.
Have a look here for monitoring background:
https://github.com/apache/aurora/blob/master/docs/monitoring.md

You'll find want to look at scheduler stats related to 'pending'.

On Mon, Oct 19, 2015 at 12:16 PM, Krish <kr...@gmail.com> wrote:

> Thanks for the pointer. Now I notice that the aurora-scheduler script has
> the --thermos_executor_path as a mandatory requirement.
>
> I have a couple of questions on how the thermos_executor/.aurora config
> file functions:
> 1. Do we have to statically define the file beforehand?
> 2. What happens when we want to dynamically change the config, say
> increasing the number of instances of a service required? Does aurora
> require a reboot then?
> 3. How do I get notified about the message mesos sends when it cannot
> schedule tasks for lack of resources? Should I depend on aurora for this or
> try to look for a hook into mesos?
>
> I think a little bit of context would help here.
> What I plan to check is to run a very basic job/task inside a docker
> container with aurora & wait for a 'resource not available' message from
> mesos, and accordingly call an api to spin up a new node in my cluster.
>
>
>
>
> --
> κρισhναν
>
> On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <Stephan.Erb@blue-yonder.com
> > wrote:
>
>> I believe you are missing the thermos_executor options that have to be
>> passed to the scheduler command line.
>>
>>
>> See
>> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
>> for an example
>>
>>
>> Best Regards,
>>
>> Stephan
>>
>>
>> ------------------------------
>> *From:* Krish <kr...@gmail.com>
>> *Sent:* Monday, October 19, 2015 8:45 AM
>> *To:* user@aurora.apache.org
>> *Subject:* Re: Stacktrace when running Apache Aurora
>>
>> Hi,
>> I am a n00b with apache aurora & trying to experiment some things on my
>> local machine with zookeeper and mesos-master running locally. They have
>> initialized properly. When I try to run aurora with the required options, I
>> get the following error, & googing hasn't helped me much here.
>> Appreciate any help. Thanks in advance.
>>
>> ...
>> ...
>> WARNING: Method [public void
>> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
>> is synthetic and is being intercepted by
>> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
>> indicate a bug.  The method
>>  may be intercepted twice, or may not be intercepted at all.
>> Exception in thread "main" com.google.inject.CreationException: Guice
>> creation errors:
>>
>> 1) An exception was caught and reported. Message: A value may only be
>> retrieved from a variable that has a default or has been
>> set.
>>   at
>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>
>> 2) Could not find a suitable constructor in
>> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
>> either one (a
>> nd only one) constructor annotated with @Inject or a zero-argument
>> constructor that is not private.
>>   at
>> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>>   at
>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>>
>> 2 errors
>>         at
>> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>>         at
>> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>>         at
>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>>         at com.google.inject.Guice.createInjector(Guice.java:95)
>>         at com.google.inject.Guice.createInjector(Guice.java:83)
>>         at
>> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>>         at
>> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>>         at
>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>>         at
>> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>>         at
>> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
>> Caused by: java.lang.IllegalStateException: A value may only be retrieved
>> from a variable that has a default or has been set.
>>         at
>> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>>         at com.twitter.common.args.Arg.get(Arg.java:82)
>>         at
>> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>>         at
>> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>>         at
>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>         at com.google.inject.util.Modules$2.configure(Modules.java:114)
>>         at
>> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>>         at com.google.inject.spi.Elements.getElements(Elements.java:101)
>>         at
>> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>>         at
>> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>>         ... 7 more
>>
>> Complete logs are present @http://pastebin.com/i72HvbYi.
>>
>>
>>
>>> --
>>> κρισhναν
>>>
>>
>>
>

Re: Stacktrace when running Apache Aurora

Posted by Krish <kr...@gmail.com>.

Thanks for the pointer. Now I notice that the aurora-scheduler script has
the --thermos_executor_path as a mandatory requirement.

I have a couple of questions on how the thermos_executor/.aurora config
file functions:
1. Do we have to statically define the file beforehand?
2. What happens when we want to dynamically change the config, say
increasing the number of instances of a service required? Does aurora
require a reboot then?
3. How do I get notified about the message mesos sends when it cannot
schedule tasks for lack of resources? Should I depend on aurora for this or
try to look for a hook into mesos?

I think a little bit of context would help here.
What I plan to check is to run a very basic job/task inside a docker
container with aurora & wait for a 'resource not available' message from
mesos, and accordingly call an api to spin up a new node in my cluster.




--
κρισhναν

On Mon, Oct 19, 2015 at 1:24 PM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> I believe you are missing the thermos_executor options that have to be
> passed to the scheduler command line.
>
>
> See
> https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39
> for an example
>
>
> Best Regards,
>
> Stephan
>
>
> ------------------------------
> *From:* Krish <kr...@gmail.com>
> *Sent:* Monday, October 19, 2015 8:45 AM
> *To:* user@aurora.apache.org
> *Subject:* Re: Stacktrace when running Apache Aurora
>
> Hi,
> I am a n00b with apache aurora & trying to experiment some things on my
> local machine with zookeeper and mesos-master running locally. They have
> initialized properly. When I try to run aurora with the required options, I
> get the following error, & googing hasn't helped me much here.
> Appreciate any help. Thanks in advance.
>
> ...
> ...
> WARNING: Method [public void
> org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)]
> is synthetic and is being intercepted by
> [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could
> indicate a bug.  The method
>  may be intercepted twice, or may not be intercepted at all.
> Exception in thread "main" com.google.inject.CreationException: Guice
> creation errors:
>
> 1) An exception was caught and reported. Message: A value may only be
> retrieved from a variable that has a default or has been
> set.
>   at
> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>
> 2) Could not find a suitable constructor in
> org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have
> either one (a
> nd only one) constructor annotated with @Inject or a zero-argument
> constructor that is not private.
>   at
> org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
>   at
> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)
>
> 2 errors
>         at
> com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
>         at
> com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
>         at
> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
>         at com.google.inject.Guice.createInjector(Guice.java:95)
>         at com.google.inject.Guice.createInjector(Guice.java:83)
>         at
> com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
>         at
> com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
>         at
> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
>         at
> com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
>         at
> org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
> Caused by: java.lang.IllegalStateException: A value may only be retrieved
> from a variable that has a default or has been set.
>         at
> com.google.common.base.Preconditions.checkState(Preconditions.java:176)
>         at com.twitter.common.args.Arg.get(Arg.java:82)
>         at
> org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
>         at
> com.google.inject.AbstractModule.configure(AbstractModule.java:59)
>         at
> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>         at com.google.inject.util.Modules$2.configure(Modules.java:114)
>         at
> com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
>         at com.google.inject.spi.Elements.getElements(Elements.java:101)
>         at
> com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
>         at
> com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
>         ... 7 more
>
> Complete logs are present @http://pastebin.com/i72HvbYi.
>
>
>
>> --
>> κρισhναν
>>
>
>

Re: Stacktrace when running Apache Aurora

Posted by "Erb, Stephan" <St...@blue-yonder.com>.

I believe you are missing the thermos_executor options that have to be passed to the scheduler command line.


See https://github.com/apache/aurora/blob/33d7e2170a86f54722a02a2dc9cb1e09fb52df25/examples/vagrant/upstart/aurora-scheduler.conf#L39? for an example


Best Regards,

Stephan


________________________________
From: Krish <kr...@gmail.com>
Sent: Monday, October 19, 2015 8:45 AM
To: user@aurora.apache.org
Subject: Re: Stacktrace when running Apache Aurora

Hi,
I am a n00b with apache aurora & trying to experiment some things on my local machine with zookeeper and mesos-master running locally. They have initialized properly. When I try to run aurora with the required options, I get the following error, & googing hasn't helped me much here.
Appreciate any help. Thanks in advance.

...
...
WARNING: Method [public void org.apache.aurora.scheduler.storage.log.SnapshotStoreImpl.applySnapshot(java.lang.Object)] is synthetic and is being intercepted by [com.twitter.common.inject.TimedInterceptor@604c5de8]. This could indicate a bug.  The method
 may be intercepted twice, or may not be intercepted at all.
Exception in thread "main" com.google.inject.CreationException: Guice creation errors:

1) An exception was caught and reported. Message: A value may only be retrieved from a variable that has a default or has been
set.
  at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)

2) Could not find a suitable constructor in org.apache.aurora.scheduler.mesos.ExecutorSettings. Classes must have either one (a
nd only one) constructor annotated with @Inject or a zero-argument constructor that is not private.
  at org.apache.aurora.scheduler.mesos.ExecutorSettings.class(ExecutorSettings.java:43)
  at org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:204)

2 errors
        at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:435)
        at com.google.inject.internal.InternalInjectorCreator.initializeStatically(InternalInjectorCreator.java:154)
        at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:106)
        at com.google.inject.Guice.createInjector(Guice.java:95)
        at com.google.inject.Guice.createInjector(Guice.java:83)
        at com.twitter.common.application.AppLauncher.configureInjection(AppLauncher.java:120)
        at com.twitter.common.application.AppLauncher.run(AppLauncher.java:87)
        at com.twitter.common.application.AppLauncher.launch(AppLauncher.java:181)
        at com.twitter.common.application.AppLauncher.launch(AppLauncher.java:142)
        at org.apache.aurora.scheduler.app.SchedulerMain.main(SchedulerMain.java:263)
Caused by: java.lang.IllegalStateException: A value may only be retrieved from a variable that has a default or has been set.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:176)
        at com.twitter.common.args.Arg.get(Arg.java:82)
        at org.apache.aurora.scheduler.app.SchedulerMain$3.configure(SchedulerMain.java:206)
        at com.google.inject.AbstractModule.configure(AbstractModule.java:59)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
        at com.google.inject.util.Modules$2.configure(Modules.java:114)
        at com.google.inject.spi.Elements$RecordingBinder.install(Elements.java:223)
        at com.google.inject.spi.Elements.getElements(Elements.java:101)
        at com.google.inject.internal.InjectorShell$Builder.build(InjectorShell.java:133)
        at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:103)
        ... 7 more

Complete logs are present @http://pastebin.com/i72HvbYi.


--
????h???