You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@aurora.apache.org by Maxim Khutornenko <ma...@apache.org> on 2015/09/15 01:14:23 UTC

JobConfig diff API

Problem:
We currently don't have a good user experience around "aurora job
diff" command. The task configs are dumped as "prettified" JSON
strings and diffed with the system diff tool. Anyone who tried to use
it knows it can be very hard to read especially when it comes to
executor data deltas. Also, the implementation is done completely
within the Aurora client making it hard to reuse this feature by other
clients (e.g.: an external deploy coordination tool).

Proposal:
Move the diff logic to the scheduler and expose it via a new
jobConfigDiff thrift API.

Benefits:
- Client will no longer have the custom non-reusable logic moving us
closer towards a "thin client" goal.
- The new RPC can be fully used by any existing or new API clients.
- The diff output will be improved via leveraging third party POJO
and/or JSON diff libraries [1,2,3, etc.].
- The server updater can be partially/fully unified with the new diff
logic further improving the overall DRY-ness.

Concerns:
- The executor data is currently treated as an opaque string blob on
the scheduler side. In reality, it's almost guaranteed to be JSON. In
order to deliver the best UX, the scheduler would have to start
requiring ExecutorConfig.data to be a valid JSON.

Any other concerns/objections/comments? I would like to formalize the
proposal be EOW if we reach consensus quickly.

Thanks,
Maxim

[1] - http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
[2] - http://javers.org/documentation/diff-examples/
[3] - https://github.com/skyscreamer/JSONassert

Re: JobConfig diff API

Posted by Maxim Khutornenko <ma...@apache.org>.

Here is a brief summary of proposed changes:
https://docs.google.com/document/d/1Fc_YhhV7fc4D9Xv6gJzpfooxbK4YWZcvzw6Bd3qVTL8

On Fri, Oct 2, 2015 at 1:47 PM, Bill Farner <wf...@apache.org> wrote:
> This matches what I was nudging towards.  I think it offers what we want
> without taking on a large technical challenge and placing restrictions on
> the executor data blob.
>
> On Friday, October 2, 2015, Maxim Khutornenko <ma...@apache.org> wrote:
>
>> Thanks David. I agree pivoting this proposal around job update driven
>> instructions would give us more incremental value for much less
>> effort. I have researched and evaluated various Java and JSON diff
>> libraries available under Apache license and could not find a
>> reasonable solution that would give us exactly what we need. I started
>> thinking about building a custom JSON diff java tool, which is
>> certainly beyond the effort I hoped it would take.
>>
>> With something like 'getJobUpdateInstructions(JobUpdateRequest r)' API
>> we could immediately improve the client diff story by avoiding
>> duplicate diff results. We would also be able to have something like
>> '--dry-run' option on a 'aurora update start' command listing the
>> affected instances and actions about to be taken.
>>
>> As for the actual TaskConfig diff details, this seems like a natural
>> improvement we could add later when/if we find a better JSON diffing
>> tool or dare to implement our own.
>>
>> Any thoughts/objections to the above?
>>
>> On Fri, Oct 2, 2015 at 1:29 PM, David McLaughlin <david@dmclaughlin.com
>> <javascript:;>> wrote:
>> > I'd like to propose an alternative - that we start off by having an API
>> > endpoint which simply returns the JobUpdateInstructions that describes
>> the
>> > changes that would happen if a given JobUpdateRequest was applied.
>> >
>> > There is a lot of value in having clients ask the scheduler to tell them
>> > what is going to happen rather than try and duplicate that diff algorithm
>> > in both places. Of course this does not solve the "pretty" diff command
>> > when trying to *display* the differences, but it does remove a lot of the
>> > work in implementing the diff command and I think for the different types
>> > of clients (UI/CLI/etc.) displaying the diffs is going to be a much
>> bigger
>> > problem anyway.
>> >
>> > In the future if the scheduler does calculate and return some
>> > representation of the diff, I'd argue that it should be stored in
>> > JobUpdateInstructions anyway, so we can always add that functionality
>> later
>> > and keep the APIs the same.
>> >
>> > Thanks,
>> > David
>> >
>> >
>> > On Wed, Sep 16, 2015 at 10:23 AM, Bill Farner <wfarner@apache.org
>> <javascript:;>> wrote:
>> >
>> >> I think it's fine to provide an 'enhanced' experience when the format is
>> >> JSON, but i don't think we should force that.
>> >>
>> >> On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <maxim@apache.org
>> <javascript:;>>
>> >> wrote:
>> >>
>> >> > The benefit of assuming a certain format is the richer experience we
>> >> > can give to our users. The *blob* may be too large to make any sense
>> >> > of it during diffing. I don't propose to enforce any schema though,
>> >> > that would be too restrictive. I do however believe assuming JSON
>> >> > format would be an acceptable tradeoff.
>> >> >
>> >> > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
>> >> > and disabling JSON diffs for any executor types that don't follow the
>> >> > guidance. This will result in a degraded user experience but may be
>> >> > the middle ground here. Thoughts?
>> >> >
>> >> > On Tue, Sep 15, 2015 at 11:43 AM,  <me...@yahoo.com.invalid>
>> wrote:
>> >> > > Isn't this data supposed to be any blob that transparently passes
>> in to
>> >> > the executor through mesos data blob. Why would we want to impose any
>> >> sort
>> >> > of format? It could be a binary blob. Executor writers should be able
>> to
>> >> > move between different schedulers/frameworks without any rework
>> ideally.
>> >> > This field seems like more like garbage in and garbage out and only
>> >> > understood by end client and the executor. Scheduler may stay out of
>> it.
>> >> > > If you compute hash and indicate same or different data between 2
>> job
>> >> > update diff, that may be reasonable.
>> >> > >
>> >> > > My 2 cents.
>> >> > >
>> >> > > Thx
>> >> > >
>> >> > > Sent from my iPhone
>> >> > >
>> >> > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zmanji@apache.org
>> <javascript:;>> wrote:
>> >> > >>
>> >> > >> I'm a proponent of firming up our executor <-> scheduler contract.
>> >> > Since we
>> >> > >> are going to get multiple executor support soon I think it would be
>> >> > nice if
>> >> > >> we said that ExecutorConfig.data was JSON.
>> >> > >>
>> >> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <
>> maxim@apache.org <javascript:;>
>> >> >
>> >> > >> wrote:
>> >> > >>
>> >> > >>> | I hope this doesn't mean we would be returning a textual
>> >> > >>> representation of a diff
>> >> > >>>
>> >> > >>> If we can make an assumption that executor data is always JSON, we
>> >> can
>> >> > >>> deliver a much more specific answer by applying JSON diff tools.
>> >> > >>> Something like:
>> >> > >>>
>> >> > >>> - "environment": "prod"
>> >> > >>> + "environment": "test"
>> >> > >>>
>> >> > >>> Otherwise, we would have to output the entire ExecutorConfig.data
>> >> blob
>> >> > >>> content for both left and right sides and let users figure out the
>> >> > >>> problem. I don't think that's acceptable.
>> >> > >>>
>> >> > >>> Does it make sense? Any suggestions on the output format of the
>> diff?
>> >> > >>> I think it should be structured but at the same time we have to
>> get
>> >> > >>> down to text level at some point to report concrete discrepancies.
>> >> > >>>
>> >> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wfarner@apache.org
>> <javascript:;>>
>> >> > wrote:
>> >> > >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can
>> >> > support
>> >> > >>>> alternative executors.  I'd hate for that to go away.
>> >> > >>>>
>> >> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <
>> jfarrell@apache.org <javascript:;>
>> >> >
>> >> > >>>> wrote:
>> >> > >>>>
>> >> > >>>>> This is one of the hoops encountered when using the Thrift api
>> >> > directly
>> >> > >>> and
>> >> > >>>>> not using the client, I'd love to see ExecutorConfig.data move
>> to a
>> >> > >>> thrift
>> >> > >>>>> object and not be a string blob
>> >> > >>>>>
>> >> > >>>>> -Jake
>> >> > >>>>>
>> >> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <
>> wfarner@apache.org <javascript:;>>
>> >> > >>> wrote:
>> >> > >>>>>
>> >> > >>>>>> I like the idea of adding this API, but i don't see why it
>> >> requires
>> >> > >>> us to
>> >> > >>>>>> make assumptions about ExecutorConfig.data.  I hope this
>> doesn't
>> >> > mean
>> >> > >>> we
>> >> > >>>>>> would be returning a textual representation of a diff.  Can you
>> >> > >>> elaborate
>> >> > >>>>>> on that?
>> >> > >>>>>>
>> >> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <
>> >> > maxim@apache.org <javascript:;>>
>> >> > >>>>>> wrote:
>> >> > >>>>>>
>> >> > >>>>>>> Problem:
>> >> > >>>>>>> We currently don't have a good user experience around "aurora
>> job
>> >> > >>>>>>> diff" command. The task configs are dumped as "prettified"
>> JSON
>> >> > >>>>>>> strings and diffed with the system diff tool. Anyone who
>> tried to
>> >> > >>> use
>> >> > >>>>>>> it knows it can be very hard to read especially when it comes
>> to
>> >> > >>>>>>> executor data deltas. Also, the implementation is done
>> completely
>> >> > >>>>>>> within the Aurora client making it hard to reuse this feature
>> by
>> >> > >>> other
>> >> > >>>>>>> clients (e.g.: an external deploy coordination tool).
>> >> > >>>>>>>
>> >> > >>>>>>> Proposal:
>> >> > >>>>>>> Move the diff logic to the scheduler and expose it via a new
>> >> > >>>>>>> jobConfigDiff thrift API.
>> >> > >>>>>>>
>> >> > >>>>>>> Benefits:
>> >> > >>>>>>> - Client will no longer have the custom non-reusable logic
>> moving
>> >> > us
>> >> > >>>>>>> closer towards a "thin client" goal.
>> >> > >>>>>>> - The new RPC can be fully used by any existing or new API
>> >> clients.
>> >> > >>>>>>> - The diff output will be improved via leveraging third party
>> >> POJO
>> >> > >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
>> >> > >>>>>>> - The server updater can be partially/fully unified with the
>> new
>> >> > >>> diff
>> >> > >>>>>>> logic further improving the overall DRY-ness.
>> >> > >>>>>>>
>> >> > >>>>>>> Concerns:
>> >> > >>>>>>> - The executor data is currently treated as an opaque string
>> blob
>> >> > on
>> >> > >>>>>>> the scheduler side. In reality, it's almost guaranteed to be
>> >> JSON.
>> >> > >>> In
>> >> > >>>>>>> order to deliver the best UX, the scheduler would have to
>> start
>> >> > >>>>>>> requiring ExecutorConfig.data to be a valid JSON.
>> >> > >>>>>>>
>> >> > >>>>>>> Any other concerns/objections/comments? I would like to
>> formalize
>> >> > >>> the
>> >> > >>>>>>> proposal be EOW if we reach consensus quickly.
>> >> > >>>>>>>
>> >> > >>>>>>> Thanks,
>> >> > >>>>>>> Maxim
>> >> > >>>>>>>
>> >> > >>>>>>> [1] -
>> >> > >>>
>> >> >
>> >>
>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>> >> > >>>>>>> [2] - http://javers.org/documentation/diff-examples/
>> >> > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
>> >> > >>>
>> >> > >>> --
>> >> > >>> Zameer Manji
>> >> > >>>
>> >> > >>>
>> >> >
>> >>
>>

Re: JobConfig diff API

Posted by Bill Farner <wf...@apache.org>.

This matches what I was nudging towards.  I think it offers what we want
without taking on a large technical challenge and placing restrictions on
the executor data blob.

On Friday, October 2, 2015, Maxim Khutornenko <ma...@apache.org> wrote:

> Thanks David. I agree pivoting this proposal around job update driven
> instructions would give us more incremental value for much less
> effort. I have researched and evaluated various Java and JSON diff
> libraries available under Apache license and could not find a
> reasonable solution that would give us exactly what we need. I started
> thinking about building a custom JSON diff java tool, which is
> certainly beyond the effort I hoped it would take.
>
> With something like 'getJobUpdateInstructions(JobUpdateRequest r)' API
> we could immediately improve the client diff story by avoiding
> duplicate diff results. We would also be able to have something like
> '--dry-run' option on a 'aurora update start' command listing the
> affected instances and actions about to be taken.
>
> As for the actual TaskConfig diff details, this seems like a natural
> improvement we could add later when/if we find a better JSON diffing
> tool or dare to implement our own.
>
> Any thoughts/objections to the above?
>
> On Fri, Oct 2, 2015 at 1:29 PM, David McLaughlin <david@dmclaughlin.com
> <javascript:;>> wrote:
> > I'd like to propose an alternative - that we start off by having an API
> > endpoint which simply returns the JobUpdateInstructions that describes
> the
> > changes that would happen if a given JobUpdateRequest was applied.
> >
> > There is a lot of value in having clients ask the scheduler to tell them
> > what is going to happen rather than try and duplicate that diff algorithm
> > in both places. Of course this does not solve the "pretty" diff command
> > when trying to *display* the differences, but it does remove a lot of the
> > work in implementing the diff command and I think for the different types
> > of clients (UI/CLI/etc.) displaying the diffs is going to be a much
> bigger
> > problem anyway.
> >
> > In the future if the scheduler does calculate and return some
> > representation of the diff, I'd argue that it should be stored in
> > JobUpdateInstructions anyway, so we can always add that functionality
> later
> > and keep the APIs the same.
> >
> > Thanks,
> > David
> >
> >
> > On Wed, Sep 16, 2015 at 10:23 AM, Bill Farner <wfarner@apache.org
> <javascript:;>> wrote:
> >
> >> I think it's fine to provide an 'enhanced' experience when the format is
> >> JSON, but i don't think we should force that.
> >>
> >> On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <maxim@apache.org
> <javascript:;>>
> >> wrote:
> >>
> >> > The benefit of assuming a certain format is the richer experience we
> >> > can give to our users. The *blob* may be too large to make any sense
> >> > of it during diffing. I don't propose to enforce any schema though,
> >> > that would be too restrictive. I do however believe assuming JSON
> >> > format would be an acceptable tradeoff.
> >> >
> >> > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
> >> > and disabling JSON diffs for any executor types that don't follow the
> >> > guidance. This will result in a degraded user experience but may be
> >> > the middle ground here. Thoughts?
> >> >
> >> > On Tue, Sep 15, 2015 at 11:43 AM,  <me...@yahoo.com.invalid>
> wrote:
> >> > > Isn't this data supposed to be any blob that transparently passes
> in to
> >> > the executor through mesos data blob. Why would we want to impose any
> >> sort
> >> > of format? It could be a binary blob. Executor writers should be able
> to
> >> > move between different schedulers/frameworks without any rework
> ideally.
> >> > This field seems like more like garbage in and garbage out and only
> >> > understood by end client and the executor. Scheduler may stay out of
> it.
> >> > > If you compute hash and indicate same or different data between 2
> job
> >> > update diff, that may be reasonable.
> >> > >
> >> > > My 2 cents.
> >> > >
> >> > > Thx
> >> > >
> >> > > Sent from my iPhone
> >> > >
> >> > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zmanji@apache.org
> <javascript:;>> wrote:
> >> > >>
> >> > >> I'm a proponent of firming up our executor <-> scheduler contract.
> >> > Since we
> >> > >> are going to get multiple executor support soon I think it would be
> >> > nice if
> >> > >> we said that ExecutorConfig.data was JSON.
> >> > >>
> >> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <
> maxim@apache.org <javascript:;>
> >> >
> >> > >> wrote:
> >> > >>
> >> > >>> | I hope this doesn't mean we would be returning a textual
> >> > >>> representation of a diff
> >> > >>>
> >> > >>> If we can make an assumption that executor data is always JSON, we
> >> can
> >> > >>> deliver a much more specific answer by applying JSON diff tools.
> >> > >>> Something like:
> >> > >>>
> >> > >>> - "environment": "prod"
> >> > >>> + "environment": "test"
> >> > >>>
> >> > >>> Otherwise, we would have to output the entire ExecutorConfig.data
> >> blob
> >> > >>> content for both left and right sides and let users figure out the
> >> > >>> problem. I don't think that's acceptable.
> >> > >>>
> >> > >>> Does it make sense? Any suggestions on the output format of the
> diff?
> >> > >>> I think it should be structured but at the same time we have to
> get
> >> > >>> down to text level at some point to report concrete discrepancies.
> >> > >>>
> >> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wfarner@apache.org
> <javascript:;>>
> >> > wrote:
> >> > >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can
> >> > support
> >> > >>>> alternative executors.  I'd hate for that to go away.
> >> > >>>>
> >> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <
> jfarrell@apache.org <javascript:;>
> >> >
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> This is one of the hoops encountered when using the Thrift api
> >> > directly
> >> > >>> and
> >> > >>>>> not using the client, I'd love to see ExecutorConfig.data move
> to a
> >> > >>> thrift
> >> > >>>>> object and not be a string blob
> >> > >>>>>
> >> > >>>>> -Jake
> >> > >>>>>
> >> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <
> wfarner@apache.org <javascript:;>>
> >> > >>> wrote:
> >> > >>>>>
> >> > >>>>>> I like the idea of adding this API, but i don't see why it
> >> requires
> >> > >>> us to
> >> > >>>>>> make assumptions about ExecutorConfig.data.  I hope this
> doesn't
> >> > mean
> >> > >>> we
> >> > >>>>>> would be returning a textual representation of a diff.  Can you
> >> > >>> elaborate
> >> > >>>>>> on that?
> >> > >>>>>>
> >> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <
> >> > maxim@apache.org <javascript:;>>
> >> > >>>>>> wrote:
> >> > >>>>>>
> >> > >>>>>>> Problem:
> >> > >>>>>>> We currently don't have a good user experience around "aurora
> job
> >> > >>>>>>> diff" command. The task configs are dumped as "prettified"
> JSON
> >> > >>>>>>> strings and diffed with the system diff tool. Anyone who
> tried to
> >> > >>> use
> >> > >>>>>>> it knows it can be very hard to read especially when it comes
> to
> >> > >>>>>>> executor data deltas. Also, the implementation is done
> completely
> >> > >>>>>>> within the Aurora client making it hard to reuse this feature
> by
> >> > >>> other
> >> > >>>>>>> clients (e.g.: an external deploy coordination tool).
> >> > >>>>>>>
> >> > >>>>>>> Proposal:
> >> > >>>>>>> Move the diff logic to the scheduler and expose it via a new
> >> > >>>>>>> jobConfigDiff thrift API.
> >> > >>>>>>>
> >> > >>>>>>> Benefits:
> >> > >>>>>>> - Client will no longer have the custom non-reusable logic
> moving
> >> > us
> >> > >>>>>>> closer towards a "thin client" goal.
> >> > >>>>>>> - The new RPC can be fully used by any existing or new API
> >> clients.
> >> > >>>>>>> - The diff output will be improved via leveraging third party
> >> POJO
> >> > >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
> >> > >>>>>>> - The server updater can be partially/fully unified with the
> new
> >> > >>> diff
> >> > >>>>>>> logic further improving the overall DRY-ness.
> >> > >>>>>>>
> >> > >>>>>>> Concerns:
> >> > >>>>>>> - The executor data is currently treated as an opaque string
> blob
> >> > on
> >> > >>>>>>> the scheduler side. In reality, it's almost guaranteed to be
> >> JSON.
> >> > >>> In
> >> > >>>>>>> order to deliver the best UX, the scheduler would have to
> start
> >> > >>>>>>> requiring ExecutorConfig.data to be a valid JSON.
> >> > >>>>>>>
> >> > >>>>>>> Any other concerns/objections/comments? I would like to
> formalize
> >> > >>> the
> >> > >>>>>>> proposal be EOW if we reach consensus quickly.
> >> > >>>>>>>
> >> > >>>>>>> Thanks,
> >> > >>>>>>> Maxim
> >> > >>>>>>>
> >> > >>>>>>> [1] -
> >> > >>>
> >> >
> >>
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> >> > >>>>>>> [2] - http://javers.org/documentation/diff-examples/
> >> > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
> >> > >>>
> >> > >>> --
> >> > >>> Zameer Manji
> >> > >>>
> >> > >>>
> >> >
> >>
>

Re: JobConfig diff API

Posted by Maxim Khutornenko <ma...@apache.org>.

Thanks David. I agree pivoting this proposal around job update driven
instructions would give us more incremental value for much less
effort. I have researched and evaluated various Java and JSON diff
libraries available under Apache license and could not find a
reasonable solution that would give us exactly what we need. I started
thinking about building a custom JSON diff java tool, which is
certainly beyond the effort I hoped it would take.

With something like 'getJobUpdateInstructions(JobUpdateRequest r)' API
we could immediately improve the client diff story by avoiding
duplicate diff results. We would also be able to have something like
'--dry-run' option on a 'aurora update start' command listing the
affected instances and actions about to be taken.

As for the actual TaskConfig diff details, this seems like a natural
improvement we could add later when/if we find a better JSON diffing
tool or dare to implement our own.

Any thoughts/objections to the above?

On Fri, Oct 2, 2015 at 1:29 PM, David McLaughlin <da...@dmclaughlin.com> wrote:
> I'd like to propose an alternative - that we start off by having an API
> endpoint which simply returns the JobUpdateInstructions that describes the
> changes that would happen if a given JobUpdateRequest was applied.
>
> There is a lot of value in having clients ask the scheduler to tell them
> what is going to happen rather than try and duplicate that diff algorithm
> in both places. Of course this does not solve the "pretty" diff command
> when trying to *display* the differences, but it does remove a lot of the
> work in implementing the diff command and I think for the different types
> of clients (UI/CLI/etc.) displaying the diffs is going to be a much bigger
> problem anyway.
>
> In the future if the scheduler does calculate and return some
> representation of the diff, I'd argue that it should be stored in
> JobUpdateInstructions anyway, so we can always add that functionality later
> and keep the APIs the same.
>
> Thanks,
> David
>
>
> On Wed, Sep 16, 2015 at 10:23 AM, Bill Farner <wf...@apache.org> wrote:
>
>> I think it's fine to provide an 'enhanced' experience when the format is
>> JSON, but i don't think we should force that.
>>
>> On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>>
>> > The benefit of assuming a certain format is the richer experience we
>> > can give to our users. The *blob* may be too large to make any sense
>> > of it during diffing. I don't propose to enforce any schema though,
>> > that would be too restrictive. I do however believe assuming JSON
>> > format would be an acceptable tradeoff.
>> >
>> > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
>> > and disabling JSON diffs for any executor types that don't follow the
>> > guidance. This will result in a degraded user experience but may be
>> > the middle ground here. Thoughts?
>> >
>> > On Tue, Sep 15, 2015 at 11:43 AM,  <me...@yahoo.com.invalid> wrote:
>> > > Isn't this data supposed to be any blob that transparently passes in to
>> > the executor through mesos data blob. Why would we want to impose any
>> sort
>> > of format? It could be a binary blob. Executor writers should be able to
>> > move between different schedulers/frameworks without any rework ideally.
>> > This field seems like more like garbage in and garbage out and only
>> > understood by end client and the executor. Scheduler may stay out of it.
>> > > If you compute hash and indicate same or different data between 2 job
>> > update diff, that may be reasonable.
>> > >
>> > > My 2 cents.
>> > >
>> > > Thx
>> > >
>> > > Sent from my iPhone
>> > >
>> > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zm...@apache.org> wrote:
>> > >>
>> > >> I'm a proponent of firming up our executor <-> scheduler contract.
>> > Since we
>> > >> are going to get multiple executor support soon I think it would be
>> > nice if
>> > >> we said that ExecutorConfig.data was JSON.
>> > >>
>> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <maxim@apache.org
>> >
>> > >> wrote:
>> > >>
>> > >>> | I hope this doesn't mean we would be returning a textual
>> > >>> representation of a diff
>> > >>>
>> > >>> If we can make an assumption that executor data is always JSON, we
>> can
>> > >>> deliver a much more specific answer by applying JSON diff tools.
>> > >>> Something like:
>> > >>>
>> > >>> - "environment": "prod"
>> > >>> + "environment": "test"
>> > >>>
>> > >>> Otherwise, we would have to output the entire ExecutorConfig.data
>> blob
>> > >>> content for both left and right sides and let users figure out the
>> > >>> problem. I don't think that's acceptable.
>> > >>>
>> > >>> Does it make sense? Any suggestions on the output format of the diff?
>> > >>> I think it should be structured but at the same time we have to get
>> > >>> down to text level at some point to report concrete discrepancies.
>> > >>>
>> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wf...@apache.org>
>> > wrote:
>> > >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can
>> > support
>> > >>>> alternative executors.  I'd hate for that to go away.
>> > >>>>
>> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jfarrell@apache.org
>> >
>> > >>>> wrote:
>> > >>>>
>> > >>>>> This is one of the hoops encountered when using the Thrift api
>> > directly
>> > >>> and
>> > >>>>> not using the client, I'd love to see ExecutorConfig.data move to a
>> > >>> thrift
>> > >>>>> object and not be a string blob
>> > >>>>>
>> > >>>>> -Jake
>> > >>>>>
>> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org>
>> > >>> wrote:
>> > >>>>>
>> > >>>>>> I like the idea of adding this API, but i don't see why it
>> requires
>> > >>> us to
>> > >>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't
>> > mean
>> > >>> we
>> > >>>>>> would be returning a textual representation of a diff.  Can you
>> > >>> elaborate
>> > >>>>>> on that?
>> > >>>>>>
>> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <
>> > maxim@apache.org>
>> > >>>>>> wrote:
>> > >>>>>>
>> > >>>>>>> Problem:
>> > >>>>>>> We currently don't have a good user experience around "aurora job
>> > >>>>>>> diff" command. The task configs are dumped as "prettified" JSON
>> > >>>>>>> strings and diffed with the system diff tool. Anyone who tried to
>> > >>> use
>> > >>>>>>> it knows it can be very hard to read especially when it comes to
>> > >>>>>>> executor data deltas. Also, the implementation is done completely
>> > >>>>>>> within the Aurora client making it hard to reuse this feature by
>> > >>> other
>> > >>>>>>> clients (e.g.: an external deploy coordination tool).
>> > >>>>>>>
>> > >>>>>>> Proposal:
>> > >>>>>>> Move the diff logic to the scheduler and expose it via a new
>> > >>>>>>> jobConfigDiff thrift API.
>> > >>>>>>>
>> > >>>>>>> Benefits:
>> > >>>>>>> - Client will no longer have the custom non-reusable logic moving
>> > us
>> > >>>>>>> closer towards a "thin client" goal.
>> > >>>>>>> - The new RPC can be fully used by any existing or new API
>> clients.
>> > >>>>>>> - The diff output will be improved via leveraging third party
>> POJO
>> > >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
>> > >>>>>>> - The server updater can be partially/fully unified with the new
>> > >>> diff
>> > >>>>>>> logic further improving the overall DRY-ness.
>> > >>>>>>>
>> > >>>>>>> Concerns:
>> > >>>>>>> - The executor data is currently treated as an opaque string blob
>> > on
>> > >>>>>>> the scheduler side. In reality, it's almost guaranteed to be
>> JSON.
>> > >>> In
>> > >>>>>>> order to deliver the best UX, the scheduler would have to start
>> > >>>>>>> requiring ExecutorConfig.data to be a valid JSON.
>> > >>>>>>>
>> > >>>>>>> Any other concerns/objections/comments? I would like to formalize
>> > >>> the
>> > >>>>>>> proposal be EOW if we reach consensus quickly.
>> > >>>>>>>
>> > >>>>>>> Thanks,
>> > >>>>>>> Maxim
>> > >>>>>>>
>> > >>>>>>> [1] -
>> > >>>
>> >
>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>> > >>>>>>> [2] - http://javers.org/documentation/diff-examples/
>> > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
>> > >>>
>> > >>> --
>> > >>> Zameer Manji
>> > >>>
>> > >>>
>> >
>>

Re: JobConfig diff API

Posted by David McLaughlin <da...@dmclaughlin.com>.

I'd like to propose an alternative - that we start off by having an API
endpoint which simply returns the JobUpdateInstructions that describes the
changes that would happen if a given JobUpdateRequest was applied.

There is a lot of value in having clients ask the scheduler to tell them
what is going to happen rather than try and duplicate that diff algorithm
in both places. Of course this does not solve the "pretty" diff command
when trying to *display* the differences, but it does remove a lot of the
work in implementing the diff command and I think for the different types
of clients (UI/CLI/etc.) displaying the diffs is going to be a much bigger
problem anyway.

In the future if the scheduler does calculate and return some
representation of the diff, I'd argue that it should be stored in
JobUpdateInstructions anyway, so we can always add that functionality later
and keep the APIs the same.

Thanks,
David


On Wed, Sep 16, 2015 at 10:23 AM, Bill Farner <wf...@apache.org> wrote:

> I think it's fine to provide an 'enhanced' experience when the format is
> JSON, but i don't think we should force that.
>
> On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
> > The benefit of assuming a certain format is the richer experience we
> > can give to our users. The *blob* may be too large to make any sense
> > of it during diffing. I don't propose to enforce any schema though,
> > that would be too restrictive. I do however believe assuming JSON
> > format would be an acceptable tradeoff.
> >
> > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
> > and disabling JSON diffs for any executor types that don't follow the
> > guidance. This will result in a degraded user experience but may be
> > the middle ground here. Thoughts?
> >
> > On Tue, Sep 15, 2015 at 11:43 AM,  <me...@yahoo.com.invalid> wrote:
> > > Isn't this data supposed to be any blob that transparently passes in to
> > the executor through mesos data blob. Why would we want to impose any
> sort
> > of format? It could be a binary blob. Executor writers should be able to
> > move between different schedulers/frameworks without any rework ideally.
> > This field seems like more like garbage in and garbage out and only
> > understood by end client and the executor. Scheduler may stay out of it.
> > > If you compute hash and indicate same or different data between 2 job
> > update diff, that may be reasonable.
> > >
> > > My 2 cents.
> > >
> > > Thx
> > >
> > > Sent from my iPhone
> > >
> > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zm...@apache.org> wrote:
> > >>
> > >> I'm a proponent of firming up our executor <-> scheduler contract.
> > Since we
> > >> are going to get multiple executor support soon I think it would be
> > nice if
> > >> we said that ExecutorConfig.data was JSON.
> > >>
> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <maxim@apache.org
> >
> > >> wrote:
> > >>
> > >>> | I hope this doesn't mean we would be returning a textual
> > >>> representation of a diff
> > >>>
> > >>> If we can make an assumption that executor data is always JSON, we
> can
> > >>> deliver a much more specific answer by applying JSON diff tools.
> > >>> Something like:
> > >>>
> > >>> - "environment": "prod"
> > >>> + "environment": "test"
> > >>>
> > >>> Otherwise, we would have to output the entire ExecutorConfig.data
> blob
> > >>> content for both left and right sides and let users figure out the
> > >>> problem. I don't think that's acceptable.
> > >>>
> > >>> Does it make sense? Any suggestions on the output format of the diff?
> > >>> I think it should be structured but at the same time we have to get
> > >>> down to text level at some point to report concrete discrepancies.
> > >>>
> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wf...@apache.org>
> > wrote:
> > >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can
> > support
> > >>>> alternative executors.  I'd hate for that to go away.
> > >>>>
> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jfarrell@apache.org
> >
> > >>>> wrote:
> > >>>>
> > >>>>> This is one of the hoops encountered when using the Thrift api
> > directly
> > >>> and
> > >>>>> not using the client, I'd love to see ExecutorConfig.data move to a
> > >>> thrift
> > >>>>> object and not be a string blob
> > >>>>>
> > >>>>> -Jake
> > >>>>>
> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org>
> > >>> wrote:
> > >>>>>
> > >>>>>> I like the idea of adding this API, but i don't see why it
> requires
> > >>> us to
> > >>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't
> > mean
> > >>> we
> > >>>>>> would be returning a textual representation of a diff.  Can you
> > >>> elaborate
> > >>>>>> on that?
> > >>>>>>
> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <
> > maxim@apache.org>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Problem:
> > >>>>>>> We currently don't have a good user experience around "aurora job
> > >>>>>>> diff" command. The task configs are dumped as "prettified" JSON
> > >>>>>>> strings and diffed with the system diff tool. Anyone who tried to
> > >>> use
> > >>>>>>> it knows it can be very hard to read especially when it comes to
> > >>>>>>> executor data deltas. Also, the implementation is done completely
> > >>>>>>> within the Aurora client making it hard to reuse this feature by
> > >>> other
> > >>>>>>> clients (e.g.: an external deploy coordination tool).
> > >>>>>>>
> > >>>>>>> Proposal:
> > >>>>>>> Move the diff logic to the scheduler and expose it via a new
> > >>>>>>> jobConfigDiff thrift API.
> > >>>>>>>
> > >>>>>>> Benefits:
> > >>>>>>> - Client will no longer have the custom non-reusable logic moving
> > us
> > >>>>>>> closer towards a "thin client" goal.
> > >>>>>>> - The new RPC can be fully used by any existing or new API
> clients.
> > >>>>>>> - The diff output will be improved via leveraging third party
> POJO
> > >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
> > >>>>>>> - The server updater can be partially/fully unified with the new
> > >>> diff
> > >>>>>>> logic further improving the overall DRY-ness.
> > >>>>>>>
> > >>>>>>> Concerns:
> > >>>>>>> - The executor data is currently treated as an opaque string blob
> > on
> > >>>>>>> the scheduler side. In reality, it's almost guaranteed to be
> JSON.
> > >>> In
> > >>>>>>> order to deliver the best UX, the scheduler would have to start
> > >>>>>>> requiring ExecutorConfig.data to be a valid JSON.
> > >>>>>>>
> > >>>>>>> Any other concerns/objections/comments? I would like to formalize
> > >>> the
> > >>>>>>> proposal be EOW if we reach consensus quickly.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Maxim
> > >>>>>>>
> > >>>>>>> [1] -
> > >>>
> >
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> > >>>>>>> [2] - http://javers.org/documentation/diff-examples/
> > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
> > >>>
> > >>> --
> > >>> Zameer Manji
> > >>>
> > >>>
> >
>

Re: JobConfig diff API

Posted by Bill Farner <wf...@apache.org>.

I think it's fine to provide an 'enhanced' experience when the format is
JSON, but i don't think we should force that.

On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <ma...@apache.org>
wrote:

> The benefit of assuming a certain format is the richer experience we
> can give to our users. The *blob* may be too large to make any sense
> of it during diffing. I don't propose to enforce any schema though,
> that would be too restrictive. I do however believe assuming JSON
> format would be an acceptable tradeoff.
>
> Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
> and disabling JSON diffs for any executor types that don't follow the
> guidance. This will result in a degraded user experience but may be
> the middle ground here. Thoughts?
>
> On Tue, Sep 15, 2015 at 11:43 AM,  <me...@yahoo.com.invalid> wrote:
> > Isn't this data supposed to be any blob that transparently passes in to
> the executor through mesos data blob. Why would we want to impose any sort
> of format? It could be a binary blob. Executor writers should be able to
> move between different schedulers/frameworks without any rework ideally.
> This field seems like more like garbage in and garbage out and only
> understood by end client and the executor. Scheduler may stay out of it.
> > If you compute hash and indicate same or different data between 2 job
> update diff, that may be reasonable.
> >
> > My 2 cents.
> >
> > Thx
> >
> > Sent from my iPhone
> >
> >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zm...@apache.org> wrote:
> >>
> >> I'm a proponent of firming up our executor <-> scheduler contract.
> Since we
> >> are going to get multiple executor support soon I think it would be
> nice if
> >> we said that ExecutorConfig.data was JSON.
> >>
> >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <ma...@apache.org>
> >> wrote:
> >>
> >>> | I hope this doesn't mean we would be returning a textual
> >>> representation of a diff
> >>>
> >>> If we can make an assumption that executor data is always JSON, we can
> >>> deliver a much more specific answer by applying JSON diff tools.
> >>> Something like:
> >>>
> >>> - "environment": "prod"
> >>> + "environment": "test"
> >>>
> >>> Otherwise, we would have to output the entire ExecutorConfig.data blob
> >>> content for both left and right sides and let users figure out the
> >>> problem. I don't think that's acceptable.
> >>>
> >>> Does it make sense? Any suggestions on the output format of the diff?
> >>> I think it should be structured but at the same time we have to get
> >>> down to text level at some point to report concrete discrepancies.
> >>>
> >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wf...@apache.org>
> wrote:
> >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can
> support
> >>>> alternative executors.  I'd hate for that to go away.
> >>>>
> >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jf...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> This is one of the hoops encountered when using the Thrift api
> directly
> >>> and
> >>>>> not using the client, I'd love to see ExecutorConfig.data move to a
> >>> thrift
> >>>>> object and not be a string blob
> >>>>>
> >>>>> -Jake
> >>>>>
> >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org>
> >>> wrote:
> >>>>>
> >>>>>> I like the idea of adding this API, but i don't see why it requires
> >>> us to
> >>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't
> mean
> >>> we
> >>>>>> would be returning a textual representation of a diff.  Can you
> >>> elaborate
> >>>>>> on that?
> >>>>>>
> >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <
> maxim@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Problem:
> >>>>>>> We currently don't have a good user experience around "aurora job
> >>>>>>> diff" command. The task configs are dumped as "prettified" JSON
> >>>>>>> strings and diffed with the system diff tool. Anyone who tried to
> >>> use
> >>>>>>> it knows it can be very hard to read especially when it comes to
> >>>>>>> executor data deltas. Also, the implementation is done completely
> >>>>>>> within the Aurora client making it hard to reuse this feature by
> >>> other
> >>>>>>> clients (e.g.: an external deploy coordination tool).
> >>>>>>>
> >>>>>>> Proposal:
> >>>>>>> Move the diff logic to the scheduler and expose it via a new
> >>>>>>> jobConfigDiff thrift API.
> >>>>>>>
> >>>>>>> Benefits:
> >>>>>>> - Client will no longer have the custom non-reusable logic moving
> us
> >>>>>>> closer towards a "thin client" goal.
> >>>>>>> - The new RPC can be fully used by any existing or new API clients.
> >>>>>>> - The diff output will be improved via leveraging third party POJO
> >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
> >>>>>>> - The server updater can be partially/fully unified with the new
> >>> diff
> >>>>>>> logic further improving the overall DRY-ness.
> >>>>>>>
> >>>>>>> Concerns:
> >>>>>>> - The executor data is currently treated as an opaque string blob
> on
> >>>>>>> the scheduler side. In reality, it's almost guaranteed to be JSON.
> >>> In
> >>>>>>> order to deliver the best UX, the scheduler would have to start
> >>>>>>> requiring ExecutorConfig.data to be a valid JSON.
> >>>>>>>
> >>>>>>> Any other concerns/objections/comments? I would like to formalize
> >>> the
> >>>>>>> proposal be EOW if we reach consensus quickly.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Maxim
> >>>>>>>
> >>>>>>> [1] -
> >>>
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> >>>>>>> [2] - http://javers.org/documentation/diff-examples/
> >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
> >>>
> >>> --
> >>> Zameer Manji
> >>>
> >>>
>

Re: JobConfig diff API

Posted by Maxim Khutornenko <ma...@apache.org>.

The benefit of assuming a certain format is the richer experience we
can give to our users. The *blob* may be too large to make any sense
of it during diffing. I don't propose to enforce any schema though,
that would be too restrictive. I do however believe assuming JSON
format would be an acceptable tradeoff.

Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
and disabling JSON diffs for any executor types that don't follow the
guidance. This will result in a degraded user experience but may be
the middle ground here. Thoughts?

On Tue, Sep 15, 2015 at 11:43 AM,  <me...@yahoo.com.invalid> wrote:
> Isn't this data supposed to be any blob that transparently passes in to the executor through mesos data blob. Why would we want to impose any sort of format? It could be a binary blob. Executor writers should be able to move between different schedulers/frameworks without any rework ideally. This field seems like more like garbage in and garbage out and only understood by end client and the executor. Scheduler may stay out of it.
> If you compute hash and indicate same or different data between 2 job update diff, that may be reasonable.
>
> My 2 cents.
>
> Thx
>
> Sent from my iPhone
>
>> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zm...@apache.org> wrote:
>>
>> I'm a proponent of firming up our executor <-> scheduler contract. Since we
>> are going to get multiple executor support soon I think it would be nice if
>> we said that ExecutorConfig.data was JSON.
>>
>> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>>
>>> | I hope this doesn't mean we would be returning a textual
>>> representation of a diff
>>>
>>> If we can make an assumption that executor data is always JSON, we can
>>> deliver a much more specific answer by applying JSON diff tools.
>>> Something like:
>>>
>>> - "environment": "prod"
>>> + "environment": "test"
>>>
>>> Otherwise, we would have to output the entire ExecutorConfig.data blob
>>> content for both left and right sides and let users figure out the
>>> problem. I don't think that's acceptable.
>>>
>>> Does it make sense? Any suggestions on the output format of the diff?
>>> I think it should be structured but at the same time we have to get
>>> down to text level at some point to report concrete discrepancies.
>>>
>>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wf...@apache.org> wrote:
>>>> The 'blob'-iness of ExecutorConfig is intentional so that we can support
>>>> alternative executors.  I'd hate for that to go away.
>>>>
>>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jf...@apache.org>
>>>> wrote:
>>>>
>>>>> This is one of the hoops encountered when using the Thrift api directly
>>> and
>>>>> not using the client, I'd love to see ExecutorConfig.data move to a
>>> thrift
>>>>> object and not be a string blob
>>>>>
>>>>> -Jake
>>>>>
>>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org>
>>> wrote:
>>>>>
>>>>>> I like the idea of adding this API, but i don't see why it requires
>>> us to
>>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't mean
>>> we
>>>>>> would be returning a textual representation of a diff.  Can you
>>> elaborate
>>>>>> on that?
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Problem:
>>>>>>> We currently don't have a good user experience around "aurora job
>>>>>>> diff" command. The task configs are dumped as "prettified" JSON
>>>>>>> strings and diffed with the system diff tool. Anyone who tried to
>>> use
>>>>>>> it knows it can be very hard to read especially when it comes to
>>>>>>> executor data deltas. Also, the implementation is done completely
>>>>>>> within the Aurora client making it hard to reuse this feature by
>>> other
>>>>>>> clients (e.g.: an external deploy coordination tool).
>>>>>>>
>>>>>>> Proposal:
>>>>>>> Move the diff logic to the scheduler and expose it via a new
>>>>>>> jobConfigDiff thrift API.
>>>>>>>
>>>>>>> Benefits:
>>>>>>> - Client will no longer have the custom non-reusable logic moving us
>>>>>>> closer towards a "thin client" goal.
>>>>>>> - The new RPC can be fully used by any existing or new API clients.
>>>>>>> - The diff output will be improved via leveraging third party POJO
>>>>>>> and/or JSON diff libraries [1,2,3, etc.].
>>>>>>> - The server updater can be partially/fully unified with the new
>>> diff
>>>>>>> logic further improving the overall DRY-ness.
>>>>>>>
>>>>>>> Concerns:
>>>>>>> - The executor data is currently treated as an opaque string blob on
>>>>>>> the scheduler side. In reality, it's almost guaranteed to be JSON.
>>> In
>>>>>>> order to deliver the best UX, the scheduler would have to start
>>>>>>> requiring ExecutorConfig.data to be a valid JSON.
>>>>>>>
>>>>>>> Any other concerns/objections/comments? I would like to formalize
>>> the
>>>>>>> proposal be EOW if we reach consensus quickly.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maxim
>>>>>>>
>>>>>>> [1] -
>>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>>>>>>> [2] - http://javers.org/documentation/diff-examples/
>>>>>>> [3] - https://github.com/skyscreamer/JSONassert
>>>
>>> --
>>> Zameer Manji
>>>
>>>

Re: JobConfig diff API

Posted by me...@yahoo.com.INVALID.

Isn't this data supposed to be any blob that transparently passes in to the executor through mesos data blob. Why would we want to impose any sort of format? It could be a binary blob. Executor writers should be able to move between different schedulers/frameworks without any rework ideally. This field seems like more like garbage in and garbage out and only understood by end client and the executor. Scheduler may stay out of it. 
If you compute hash and indicate same or different data between 2 job update diff, that may be reasonable.

My 2 cents.

Thx

Sent from my iPhone

> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zm...@apache.org> wrote:
> 
> I'm a proponent of firming up our executor <-> scheduler contract. Since we
> are going to get multiple executor support soon I think it would be nice if
> we said that ExecutorConfig.data was JSON.
> 
> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <ma...@apache.org>
> wrote:
> 
>> | I hope this doesn't mean we would be returning a textual
>> representation of a diff
>> 
>> If we can make an assumption that executor data is always JSON, we can
>> deliver a much more specific answer by applying JSON diff tools.
>> Something like:
>> 
>> - "environment": "prod"
>> + "environment": "test"
>> 
>> Otherwise, we would have to output the entire ExecutorConfig.data blob
>> content for both left and right sides and let users figure out the
>> problem. I don't think that's acceptable.
>> 
>> Does it make sense? Any suggestions on the output format of the diff?
>> I think it should be structured but at the same time we have to get
>> down to text level at some point to report concrete discrepancies.
>> 
>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wf...@apache.org> wrote:
>>> The 'blob'-iness of ExecutorConfig is intentional so that we can support
>>> alternative executors.  I'd hate for that to go away.
>>> 
>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jf...@apache.org>
>>> wrote:
>>> 
>>>> This is one of the hoops encountered when using the Thrift api directly
>> and
>>>> not using the client, I'd love to see ExecutorConfig.data move to a
>> thrift
>>>> object and not be a string blob
>>>> 
>>>> -Jake
>>>> 
>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org>
>> wrote:
>>>> 
>>>>> I like the idea of adding this API, but i don't see why it requires
>> us to
>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't mean
>> we
>>>>> would be returning a textual representation of a diff.  Can you
>> elaborate
>>>>> on that?
>>>>> 
>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Problem:
>>>>>> We currently don't have a good user experience around "aurora job
>>>>>> diff" command. The task configs are dumped as "prettified" JSON
>>>>>> strings and diffed with the system diff tool. Anyone who tried to
>> use
>>>>>> it knows it can be very hard to read especially when it comes to
>>>>>> executor data deltas. Also, the implementation is done completely
>>>>>> within the Aurora client making it hard to reuse this feature by
>> other
>>>>>> clients (e.g.: an external deploy coordination tool).
>>>>>> 
>>>>>> Proposal:
>>>>>> Move the diff logic to the scheduler and expose it via a new
>>>>>> jobConfigDiff thrift API.
>>>>>> 
>>>>>> Benefits:
>>>>>> - Client will no longer have the custom non-reusable logic moving us
>>>>>> closer towards a "thin client" goal.
>>>>>> - The new RPC can be fully used by any existing or new API clients.
>>>>>> - The diff output will be improved via leveraging third party POJO
>>>>>> and/or JSON diff libraries [1,2,3, etc.].
>>>>>> - The server updater can be partially/fully unified with the new
>> diff
>>>>>> logic further improving the overall DRY-ness.
>>>>>> 
>>>>>> Concerns:
>>>>>> - The executor data is currently treated as an opaque string blob on
>>>>>> the scheduler side. In reality, it's almost guaranteed to be JSON.
>> In
>>>>>> order to deliver the best UX, the scheduler would have to start
>>>>>> requiring ExecutorConfig.data to be a valid JSON.
>>>>>> 
>>>>>> Any other concerns/objections/comments? I would like to formalize
>> the
>>>>>> proposal be EOW if we reach consensus quickly.
>>>>>> 
>>>>>> Thanks,
>>>>>> Maxim
>>>>>> 
>>>>>> [1] -
>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>>>>>> [2] - http://javers.org/documentation/diff-examples/
>>>>>> [3] - https://github.com/skyscreamer/JSONassert
>> 
>> --
>> Zameer Manji
>> 
>>

Re: JobConfig diff API

Posted by Zameer Manji <zm...@apache.org>.

I'm a proponent of firming up our executor <-> scheduler contract. Since we
are going to get multiple executor support soon I think it would be nice if
we said that ExecutorConfig.data was JSON.

On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <ma...@apache.org>
wrote:

> | I hope this doesn't mean we would be returning a textual
> representation of a diff
>
> If we can make an assumption that executor data is always JSON, we can
> deliver a much more specific answer by applying JSON diff tools.
> Something like:
>
> - "environment": "prod"
> + "environment": "test"
>
> Otherwise, we would have to output the entire ExecutorConfig.data blob
> content for both left and right sides and let users figure out the
> problem. I don't think that's acceptable.
>
> Does it make sense? Any suggestions on the output format of the diff?
> I think it should be structured but at the same time we have to get
> down to text level at some point to report concrete discrepancies.
>
> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wf...@apache.org> wrote:
> > The 'blob'-iness of ExecutorConfig is intentional so that we can support
> > alternative executors.  I'd hate for that to go away.
> >
> > On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jf...@apache.org>
> wrote:
> >
> >> This is one of the hoops encountered when using the Thrift api directly
> and
> >> not using the client, I'd love to see ExecutorConfig.data move to a
> thrift
> >> object and not be a string blob
> >>
> >> -Jake
> >>
> >> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org>
> wrote:
> >>
> >> > I like the idea of adding this API, but i don't see why it requires
> us to
> >> > make assumptions about ExecutorConfig.data.  I hope this doesn't mean
> we
> >> > would be returning a textual representation of a diff.  Can you
> elaborate
> >> > on that?
> >> >
> >> > On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org>
> >> > wrote:
> >> >
> >> > > Problem:
> >> > > We currently don't have a good user experience around "aurora job
> >> > > diff" command. The task configs are dumped as "prettified" JSON
> >> > > strings and diffed with the system diff tool. Anyone who tried to
> use
> >> > > it knows it can be very hard to read especially when it comes to
> >> > > executor data deltas. Also, the implementation is done completely
> >> > > within the Aurora client making it hard to reuse this feature by
> other
> >> > > clients (e.g.: an external deploy coordination tool).
> >> > >
> >> > > Proposal:
> >> > > Move the diff logic to the scheduler and expose it via a new
> >> > > jobConfigDiff thrift API.
> >> > >
> >> > > Benefits:
> >> > > - Client will no longer have the custom non-reusable logic moving us
> >> > > closer towards a "thin client" goal.
> >> > > - The new RPC can be fully used by any existing or new API clients.
> >> > > - The diff output will be improved via leveraging third party POJO
> >> > > and/or JSON diff libraries [1,2,3, etc.].
> >> > > - The server updater can be partially/fully unified with the new
> diff
> >> > > logic further improving the overall DRY-ness.
> >> > >
> >> > > Concerns:
> >> > > - The executor data is currently treated as an opaque string blob on
> >> > > the scheduler side. In reality, it's almost guaranteed to be JSON.
> In
> >> > > order to deliver the best UX, the scheduler would have to start
> >> > > requiring ExecutorConfig.data to be a valid JSON.
> >> > >
> >> > > Any other concerns/objections/comments? I would like to formalize
> the
> >> > > proposal be EOW if we reach consensus quickly.
> >> > >
> >> > > Thanks,
> >> > > Maxim
> >> > >
> >> > > [1] -
> >> > >
> >> >
> >>
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> >> > > [2] - http://javers.org/documentation/diff-examples/
> >> > > [3] - https://github.com/skyscreamer/JSONassert
> >> > >
> >> >
> >>
>
> --
> Zameer Manji
>
>

Re: JobConfig diff API

Posted by Maxim Khutornenko <ma...@apache.org>.

| I hope this doesn't mean we would be returning a textual
representation of a diff

If we can make an assumption that executor data is always JSON, we can
deliver a much more specific answer by applying JSON diff tools.
Something like:

- "environment": "prod"
+ "environment": "test"

Otherwise, we would have to output the entire ExecutorConfig.data blob
content for both left and right sides and let users figure out the
problem. I don't think that's acceptable.

Does it make sense? Any suggestions on the output format of the diff?
I think it should be structured but at the same time we have to get
down to text level at some point to report concrete discrepancies.

On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wf...@apache.org> wrote:
> The 'blob'-iness of ExecutorConfig is intentional so that we can support
> alternative executors.  I'd hate for that to go away.
>
> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jf...@apache.org> wrote:
>
>> This is one of the hoops encountered when using the Thrift api directly and
>> not using the client, I'd love to see ExecutorConfig.data move to a thrift
>> object and not be a string blob
>>
>> -Jake
>>
>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org> wrote:
>>
>> > I like the idea of adding this API, but i don't see why it requires us to
>> > make assumptions about ExecutorConfig.data.  I hope this doesn't mean we
>> > would be returning a textual representation of a diff.  Can you elaborate
>> > on that?
>> >
>> > On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org>
>> > wrote:
>> >
>> > > Problem:
>> > > We currently don't have a good user experience around "aurora job
>> > > diff" command. The task configs are dumped as "prettified" JSON
>> > > strings and diffed with the system diff tool. Anyone who tried to use
>> > > it knows it can be very hard to read especially when it comes to
>> > > executor data deltas. Also, the implementation is done completely
>> > > within the Aurora client making it hard to reuse this feature by other
>> > > clients (e.g.: an external deploy coordination tool).
>> > >
>> > > Proposal:
>> > > Move the diff logic to the scheduler and expose it via a new
>> > > jobConfigDiff thrift API.
>> > >
>> > > Benefits:
>> > > - Client will no longer have the custom non-reusable logic moving us
>> > > closer towards a "thin client" goal.
>> > > - The new RPC can be fully used by any existing or new API clients.
>> > > - The diff output will be improved via leveraging third party POJO
>> > > and/or JSON diff libraries [1,2,3, etc.].
>> > > - The server updater can be partially/fully unified with the new diff
>> > > logic further improving the overall DRY-ness.
>> > >
>> > > Concerns:
>> > > - The executor data is currently treated as an opaque string blob on
>> > > the scheduler side. In reality, it's almost guaranteed to be JSON. In
>> > > order to deliver the best UX, the scheduler would have to start
>> > > requiring ExecutorConfig.data to be a valid JSON.
>> > >
>> > > Any other concerns/objections/comments? I would like to formalize the
>> > > proposal be EOW if we reach consensus quickly.
>> > >
>> > > Thanks,
>> > > Maxim
>> > >
>> > > [1] -
>> > >
>> >
>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>> > > [2] - http://javers.org/documentation/diff-examples/
>> > > [3] - https://github.com/skyscreamer/JSONassert
>> > >
>> >
>>

Re: JobConfig diff API

Posted by Bill Farner <wf...@apache.org>.

The 'blob'-iness of ExecutorConfig is intentional so that we can support
alternative executors.  I'd hate for that to go away.

On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jf...@apache.org> wrote:

> This is one of the hoops encountered when using the Thrift api directly and
> not using the client, I'd love to see ExecutorConfig.data move to a thrift
> object and not be a string blob
>
> -Jake
>
> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org> wrote:
>
> > I like the idea of adding this API, but i don't see why it requires us to
> > make assumptions about ExecutorConfig.data.  I hope this doesn't mean we
> > would be returning a textual representation of a diff.  Can you elaborate
> > on that?
> >
> > On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org>
> > wrote:
> >
> > > Problem:
> > > We currently don't have a good user experience around "aurora job
> > > diff" command. The task configs are dumped as "prettified" JSON
> > > strings and diffed with the system diff tool. Anyone who tried to use
> > > it knows it can be very hard to read especially when it comes to
> > > executor data deltas. Also, the implementation is done completely
> > > within the Aurora client making it hard to reuse this feature by other
> > > clients (e.g.: an external deploy coordination tool).
> > >
> > > Proposal:
> > > Move the diff logic to the scheduler and expose it via a new
> > > jobConfigDiff thrift API.
> > >
> > > Benefits:
> > > - Client will no longer have the custom non-reusable logic moving us
> > > closer towards a "thin client" goal.
> > > - The new RPC can be fully used by any existing or new API clients.
> > > - The diff output will be improved via leveraging third party POJO
> > > and/or JSON diff libraries [1,2,3, etc.].
> > > - The server updater can be partially/fully unified with the new diff
> > > logic further improving the overall DRY-ness.
> > >
> > > Concerns:
> > > - The executor data is currently treated as an opaque string blob on
> > > the scheduler side. In reality, it's almost guaranteed to be JSON. In
> > > order to deliver the best UX, the scheduler would have to start
> > > requiring ExecutorConfig.data to be a valid JSON.
> > >
> > > Any other concerns/objections/comments? I would like to formalize the
> > > proposal be EOW if we reach consensus quickly.
> > >
> > > Thanks,
> > > Maxim
> > >
> > > [1] -
> > >
> >
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> > > [2] - http://javers.org/documentation/diff-examples/
> > > [3] - https://github.com/skyscreamer/JSONassert
> > >
> >
>

Re: JobConfig diff API

Posted by Jake Farrell <jf...@apache.org>.

This is one of the hoops encountered when using the Thrift api directly and
not using the client, I'd love to see ExecutorConfig.data move to a thrift
object and not be a string blob

-Jake

On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wf...@apache.org> wrote:

> I like the idea of adding this API, but i don't see why it requires us to
> make assumptions about ExecutorConfig.data.  I hope this doesn't mean we
> would be returning a textual representation of a diff.  Can you elaborate
> on that?
>
> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
> > Problem:
> > We currently don't have a good user experience around "aurora job
> > diff" command. The task configs are dumped as "prettified" JSON
> > strings and diffed with the system diff tool. Anyone who tried to use
> > it knows it can be very hard to read especially when it comes to
> > executor data deltas. Also, the implementation is done completely
> > within the Aurora client making it hard to reuse this feature by other
> > clients (e.g.: an external deploy coordination tool).
> >
> > Proposal:
> > Move the diff logic to the scheduler and expose it via a new
> > jobConfigDiff thrift API.
> >
> > Benefits:
> > - Client will no longer have the custom non-reusable logic moving us
> > closer towards a "thin client" goal.
> > - The new RPC can be fully used by any existing or new API clients.
> > - The diff output will be improved via leveraging third party POJO
> > and/or JSON diff libraries [1,2,3, etc.].
> > - The server updater can be partially/fully unified with the new diff
> > logic further improving the overall DRY-ness.
> >
> > Concerns:
> > - The executor data is currently treated as an opaque string blob on
> > the scheduler side. In reality, it's almost guaranteed to be JSON. In
> > order to deliver the best UX, the scheduler would have to start
> > requiring ExecutorConfig.data to be a valid JSON.
> >
> > Any other concerns/objections/comments? I would like to formalize the
> > proposal be EOW if we reach consensus quickly.
> >
> > Thanks,
> > Maxim
> >
> > [1] -
> >
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> > [2] - http://javers.org/documentation/diff-examples/
> > [3] - https://github.com/skyscreamer/JSONassert
> >
>

Re: JobConfig diff API

Posted by Bill Farner <wf...@apache.org>.

I like the idea of adding this API, but i don't see why it requires us to
make assumptions about ExecutorConfig.data.  I hope this doesn't mean we
would be returning a textual representation of a diff.  Can you elaborate
on that?

On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> Problem:
> We currently don't have a good user experience around "aurora job
> diff" command. The task configs are dumped as "prettified" JSON
> strings and diffed with the system diff tool. Anyone who tried to use
> it knows it can be very hard to read especially when it comes to
> executor data deltas. Also, the implementation is done completely
> within the Aurora client making it hard to reuse this feature by other
> clients (e.g.: an external deploy coordination tool).
>
> Proposal:
> Move the diff logic to the scheduler and expose it via a new
> jobConfigDiff thrift API.
>
> Benefits:
> - Client will no longer have the custom non-reusable logic moving us
> closer towards a "thin client" goal.
> - The new RPC can be fully used by any existing or new API clients.
> - The diff output will be improved via leveraging third party POJO
> and/or JSON diff libraries [1,2,3, etc.].
> - The server updater can be partially/fully unified with the new diff
> logic further improving the overall DRY-ness.
>
> Concerns:
> - The executor data is currently treated as an opaque string blob on
> the scheduler side. In reality, it's almost guaranteed to be JSON. In
> order to deliver the best UX, the scheduler would have to start
> requiring ExecutorConfig.data to be a valid JSON.
>
> Any other concerns/objections/comments? I would like to formalize the
> proposal be EOW if we reach consensus quickly.
>
> Thanks,
> Maxim
>
> [1] -
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> [2] - http://javers.org/documentation/diff-examples/
> [3] - https://github.com/skyscreamer/JSONassert
>