You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Zameer Manji <zm...@apache.org> on 2016/09/01 21:24:06 UTC

[PROPOSAL] New RPC `fetchJobUpdates`

Hey,

I've noticed a hole in our current API which makes it difficult to write
external clients and other tooling around fetching the state of updates.

Currently, to fetch updates we are given two RPCs:
````
/** Gets job update summaries. */
Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)

/** Gets job update details. */
Response getJobUpdateDetails(1: JobUpdateKey key)

````

The `getJobUpdateSummaries` RPC is not scoped to a single update and
returns a
set of `JobUpdateSummary` structs. The struct is defined:
````
/** Summary of the job update including job key, user and current state. */
struct JobUpdateSummary {
  /** Unique identifier for the update. */
  5: JobUpdateKey key

  /** User initiated an update. */
  3: string user

  /** Current job update state. */
  4: JobUpdateState state
}
````

The `getJobUpdateDetails` RPC is scoped to a single update and returns the
following struct:

````
struct JobUpdateDetails {
  /** Update definition. */
  1: JobUpdate update

  /** History for this update. */
  2: list<JobUpdateEvent> updateEvents

  /** History for the individual instances updated. */
  3: list<JobInstanceUpdateEvent> instanceEvents
}

````

Maxim mentioned to me yesterday that this RPC is scoped to a single update
because assembling the `instanceEvents` can be extremely expensive. A query
that
could span more than a single update risks taking down the scheduler in a
large
cluster.


The problem I discovered is that there is no batch API to get the
inexpensive
information inside the `JobUpdate` struct. For reference this struct
contains:

````
/** Full definition of the job update. */
struct JobUpdate {
  /** Update summary. */
  1: JobUpdateSummary summary

  /** Update configuration. */
  2: JobUpdateInstructions instructions
}
````

Consumers are forced to make several `getJobUpdateDetails` calls to get
multiple
`JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
assemble,
I'm proposing a new RPC that will allow consumers to get several `JobUpdate`
structs in a single call.

````
/** Gets job updates. */
Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
````

If there are no objections, I will file tickets and put up a patch to
implement
this.

--
Zameer Manji

Re: [PROPOSAL] New RPC `fetchJobUpdates`

Posted by Zameer Manji <zm...@apache.org>.
Maxim and I discussed offline after the Aurora dev sync today and we think
the best way forward is improving the `getJobUpdateDetails` RPC to take a
`JobUpdateQuery`.

I'll be filing a ticket and moving forward on this.

On Fri, Sep 2, 2016 at 3:36 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> The pulseJobUpdate RPC was added as an entirely new feature. This
> proposal suggests a read-only "convenience" method to pull data that
> is already available via existing means.
>
> As for "taking down the scheduler", this argument is moot. It's
> possible to DDOS scheduler via existing RPCs today and the suggested
> alternative has the same inherent risk of crafting an unscoped query
> that will pull all updates from the store. Also, unless there are a
> LOT of those queries (read dos-attack that OOMs scheduler) it's
> unlikely that the scheduler will sustain any damage as read-only
> queries don't acquire any global or table locks.
>
> On Fri, Sep 2, 2016 at 3:24 PM, Zameer Manji <zm...@apache.org> wrote:
> > I'm not convinced by your argument about adding a read only RPC that's
> not
> > covered by the traditional integration test is going to cause bit rot. We
> > already have an RPC that is not covered by the traditional integration
> test
> > `pulseJobUpdate` and it hasn't bit rotted AFAIK.
> >
> > Further some members of the community have been writing alternative
> clients
> > around our API and I think adding this RPC will better support their use
> > cases. `JobUpdate` is a first class struct in our API and I think it
> makes
> > sense to expose a query interface for it.
> >
> > If integration tests are your primary concern, I can also add an
> > integration test for this.
> >
> > Regarding adding `JobUpdateQuery` to `getJobUpdateDetails`, I'm no longer
> > convinced that it's the best way to go because of the risk it contains.
> > It's all to easy to craft a query that can pull a lot of data that can
> take
> > down the scheduler. In my case, I sometimes see it being useful for
> getting
> > all of the `JobUpdates` of a role (completed and active), but that risks
> > pulling a lot of unnecessary data.
> >
> > I will also add that since `JobUpdate` is a first class struct in our
> > storage and APIs, this RPC contains very minimal code. I think it's
> highly
> > unlikely that it will bit rot.
> >
> > On Fri, Sep 2, 2016 at 2:52 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
> >
> >> As I mentioned in Slack, I am ok with the new RPC as long as there is
> >> a use for it elsewhere in the client or UI. Adding a read-only RPC
> >> that isn't going to be called by our traditional integration test
> >> clients sets a fertile ground for bit rot.
> >>
> >> I am actually warming up to your original proposal of adding
> >> JobUpdateQuery into the existing getJobUpdateDetails RPC. While it may
> >> be more expensive to pull multiple updates, we don't necessarily risk
> >> much after we migrated to MVStore on the H2 side. There are no table
> >> locks acquired and the only downside would be pulling events along
> >> with what you need. Provided the query is narrowly scoped, that should
> >> deliver acceptable performance.
> >>
> >> On Thu, Sep 1, 2016 at 2:24 PM, Zameer Manji <zm...@apache.org> wrote:
> >> > Hey,
> >> >
> >> > I've noticed a hole in our current API which makes it difficult to
> write
> >> > external clients and other tooling around fetching the state of
> updates.
> >> >
> >> > Currently, to fetch updates we are given two RPCs:
> >> > ````
> >> > /** Gets job update summaries. */
> >> > Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)
> >> >
> >> > /** Gets job update details. */
> >> > Response getJobUpdateDetails(1: JobUpdateKey key)
> >> >
> >> > ````
> >> >
> >> > The `getJobUpdateSummaries` RPC is not scoped to a single update and
> >> > returns a
> >> > set of `JobUpdateSummary` structs. The struct is defined:
> >> > ````
> >> > /** Summary of the job update including job key, user and current
> state.
> >> */
> >> > struct JobUpdateSummary {
> >> >   /** Unique identifier for the update. */
> >> >   5: JobUpdateKey key
> >> >
> >> >   /** User initiated an update. */
> >> >   3: string user
> >> >
> >> >   /** Current job update state. */
> >> >   4: JobUpdateState state
> >> > }
> >> > ````
> >> >
> >> > The `getJobUpdateDetails` RPC is scoped to a single update and returns
> >> the
> >> > following struct:
> >> >
> >> > ````
> >> > struct JobUpdateDetails {
> >> >   /** Update definition. */
> >> >   1: JobUpdate update
> >> >
> >> >   /** History for this update. */
> >> >   2: list<JobUpdateEvent> updateEvents
> >> >
> >> >   /** History for the individual instances updated. */
> >> >   3: list<JobInstanceUpdateEvent> instanceEvents
> >> > }
> >> >
> >> > ````
> >> >
> >> > Maxim mentioned to me yesterday that this RPC is scoped to a single
> >> update
> >> > because assembling the `instanceEvents` can be extremely expensive. A
> >> query
> >> > that
> >> > could span more than a single update risks taking down the scheduler
> in a
> >> > large
> >> > cluster.
> >> >
> >> >
> >> > The problem I discovered is that there is no batch API to get the
> >> > inexpensive
> >> > information inside the `JobUpdate` struct. For reference this struct
> >> > contains:
> >> >
> >> > ````
> >> > /** Full definition of the job update. */
> >> > struct JobUpdate {
> >> >   /** Update summary. */
> >> >   1: JobUpdateSummary summary
> >> >
> >> >   /** Update configuration. */
> >> >   2: JobUpdateInstructions instructions
> >> > }
> >> > ````
> >> >
> >> > Consumers are forced to make several `getJobUpdateDetails` calls to
> get
> >> > multiple
> >> > `JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
> >> > assemble,
> >> > I'm proposing a new RPC that will allow consumers to get several
> >> `JobUpdate`
> >> > structs in a single call.
> >> >
> >> > ````
> >> > /** Gets job updates. */
> >> > Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
> >> > ````
> >> >
> >> > If there are no objections, I will file tickets and put up a patch to
> >> > implement
> >> > this.
> >> >
> >> > --
> >> > Zameer Manji
> >>
>

Re: [PROPOSAL] New RPC `fetchJobUpdates`

Posted by Maxim Khutornenko <ma...@apache.org>.
The pulseJobUpdate RPC was added as an entirely new feature. This
proposal suggests a read-only "convenience" method to pull data that
is already available via existing means.

As for "taking down the scheduler", this argument is moot. It's
possible to DDOS scheduler via existing RPCs today and the suggested
alternative has the same inherent risk of crafting an unscoped query
that will pull all updates from the store. Also, unless there are a
LOT of those queries (read dos-attack that OOMs scheduler) it's
unlikely that the scheduler will sustain any damage as read-only
queries don't acquire any global or table locks.

On Fri, Sep 2, 2016 at 3:24 PM, Zameer Manji <zm...@apache.org> wrote:
> I'm not convinced by your argument about adding a read only RPC that's not
> covered by the traditional integration test is going to cause bit rot. We
> already have an RPC that is not covered by the traditional integration test
> `pulseJobUpdate` and it hasn't bit rotted AFAIK.
>
> Further some members of the community have been writing alternative clients
> around our API and I think adding this RPC will better support their use
> cases. `JobUpdate` is a first class struct in our API and I think it makes
> sense to expose a query interface for it.
>
> If integration tests are your primary concern, I can also add an
> integration test for this.
>
> Regarding adding `JobUpdateQuery` to `getJobUpdateDetails`, I'm no longer
> convinced that it's the best way to go because of the risk it contains.
> It's all to easy to craft a query that can pull a lot of data that can take
> down the scheduler. In my case, I sometimes see it being useful for getting
> all of the `JobUpdates` of a role (completed and active), but that risks
> pulling a lot of unnecessary data.
>
> I will also add that since `JobUpdate` is a first class struct in our
> storage and APIs, this RPC contains very minimal code. I think it's highly
> unlikely that it will bit rot.
>
> On Fri, Sep 2, 2016 at 2:52 PM, Maxim Khutornenko <ma...@apache.org> wrote:
>
>> As I mentioned in Slack, I am ok with the new RPC as long as there is
>> a use for it elsewhere in the client or UI. Adding a read-only RPC
>> that isn't going to be called by our traditional integration test
>> clients sets a fertile ground for bit rot.
>>
>> I am actually warming up to your original proposal of adding
>> JobUpdateQuery into the existing getJobUpdateDetails RPC. While it may
>> be more expensive to pull multiple updates, we don't necessarily risk
>> much after we migrated to MVStore on the H2 side. There are no table
>> locks acquired and the only downside would be pulling events along
>> with what you need. Provided the query is narrowly scoped, that should
>> deliver acceptable performance.
>>
>> On Thu, Sep 1, 2016 at 2:24 PM, Zameer Manji <zm...@apache.org> wrote:
>> > Hey,
>> >
>> > I've noticed a hole in our current API which makes it difficult to write
>> > external clients and other tooling around fetching the state of updates.
>> >
>> > Currently, to fetch updates we are given two RPCs:
>> > ````
>> > /** Gets job update summaries. */
>> > Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)
>> >
>> > /** Gets job update details. */
>> > Response getJobUpdateDetails(1: JobUpdateKey key)
>> >
>> > ````
>> >
>> > The `getJobUpdateSummaries` RPC is not scoped to a single update and
>> > returns a
>> > set of `JobUpdateSummary` structs. The struct is defined:
>> > ````
>> > /** Summary of the job update including job key, user and current state.
>> */
>> > struct JobUpdateSummary {
>> >   /** Unique identifier for the update. */
>> >   5: JobUpdateKey key
>> >
>> >   /** User initiated an update. */
>> >   3: string user
>> >
>> >   /** Current job update state. */
>> >   4: JobUpdateState state
>> > }
>> > ````
>> >
>> > The `getJobUpdateDetails` RPC is scoped to a single update and returns
>> the
>> > following struct:
>> >
>> > ````
>> > struct JobUpdateDetails {
>> >   /** Update definition. */
>> >   1: JobUpdate update
>> >
>> >   /** History for this update. */
>> >   2: list<JobUpdateEvent> updateEvents
>> >
>> >   /** History for the individual instances updated. */
>> >   3: list<JobInstanceUpdateEvent> instanceEvents
>> > }
>> >
>> > ````
>> >
>> > Maxim mentioned to me yesterday that this RPC is scoped to a single
>> update
>> > because assembling the `instanceEvents` can be extremely expensive. A
>> query
>> > that
>> > could span more than a single update risks taking down the scheduler in a
>> > large
>> > cluster.
>> >
>> >
>> > The problem I discovered is that there is no batch API to get the
>> > inexpensive
>> > information inside the `JobUpdate` struct. For reference this struct
>> > contains:
>> >
>> > ````
>> > /** Full definition of the job update. */
>> > struct JobUpdate {
>> >   /** Update summary. */
>> >   1: JobUpdateSummary summary
>> >
>> >   /** Update configuration. */
>> >   2: JobUpdateInstructions instructions
>> > }
>> > ````
>> >
>> > Consumers are forced to make several `getJobUpdateDetails` calls to get
>> > multiple
>> > `JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
>> > assemble,
>> > I'm proposing a new RPC that will allow consumers to get several
>> `JobUpdate`
>> > structs in a single call.
>> >
>> > ````
>> > /** Gets job updates. */
>> > Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
>> > ````
>> >
>> > If there are no objections, I will file tickets and put up a patch to
>> > implement
>> > this.
>> >
>> > --
>> > Zameer Manji
>>

Re: [PROPOSAL] New RPC `fetchJobUpdates`

Posted by Zameer Manji <zm...@apache.org>.
I'm not convinced by your argument about adding a read only RPC that's not
covered by the traditional integration test is going to cause bit rot. We
already have an RPC that is not covered by the traditional integration test
`pulseJobUpdate` and it hasn't bit rotted AFAIK.

Further some members of the community have been writing alternative clients
around our API and I think adding this RPC will better support their use
cases. `JobUpdate` is a first class struct in our API and I think it makes
sense to expose a query interface for it.

If integration tests are your primary concern, I can also add an
integration test for this.

Regarding adding `JobUpdateQuery` to `getJobUpdateDetails`, I'm no longer
convinced that it's the best way to go because of the risk it contains.
It's all to easy to craft a query that can pull a lot of data that can take
down the scheduler. In my case, I sometimes see it being useful for getting
all of the `JobUpdates` of a role (completed and active), but that risks
pulling a lot of unnecessary data.

I will also add that since `JobUpdate` is a first class struct in our
storage and APIs, this RPC contains very minimal code. I think it's highly
unlikely that it will bit rot.

On Fri, Sep 2, 2016 at 2:52 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> As I mentioned in Slack, I am ok with the new RPC as long as there is
> a use for it elsewhere in the client or UI. Adding a read-only RPC
> that isn't going to be called by our traditional integration test
> clients sets a fertile ground for bit rot.
>
> I am actually warming up to your original proposal of adding
> JobUpdateQuery into the existing getJobUpdateDetails RPC. While it may
> be more expensive to pull multiple updates, we don't necessarily risk
> much after we migrated to MVStore on the H2 side. There are no table
> locks acquired and the only downside would be pulling events along
> with what you need. Provided the query is narrowly scoped, that should
> deliver acceptable performance.
>
> On Thu, Sep 1, 2016 at 2:24 PM, Zameer Manji <zm...@apache.org> wrote:
> > Hey,
> >
> > I've noticed a hole in our current API which makes it difficult to write
> > external clients and other tooling around fetching the state of updates.
> >
> > Currently, to fetch updates we are given two RPCs:
> > ````
> > /** Gets job update summaries. */
> > Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)
> >
> > /** Gets job update details. */
> > Response getJobUpdateDetails(1: JobUpdateKey key)
> >
> > ````
> >
> > The `getJobUpdateSummaries` RPC is not scoped to a single update and
> > returns a
> > set of `JobUpdateSummary` structs. The struct is defined:
> > ````
> > /** Summary of the job update including job key, user and current state.
> */
> > struct JobUpdateSummary {
> >   /** Unique identifier for the update. */
> >   5: JobUpdateKey key
> >
> >   /** User initiated an update. */
> >   3: string user
> >
> >   /** Current job update state. */
> >   4: JobUpdateState state
> > }
> > ````
> >
> > The `getJobUpdateDetails` RPC is scoped to a single update and returns
> the
> > following struct:
> >
> > ````
> > struct JobUpdateDetails {
> >   /** Update definition. */
> >   1: JobUpdate update
> >
> >   /** History for this update. */
> >   2: list<JobUpdateEvent> updateEvents
> >
> >   /** History for the individual instances updated. */
> >   3: list<JobInstanceUpdateEvent> instanceEvents
> > }
> >
> > ````
> >
> > Maxim mentioned to me yesterday that this RPC is scoped to a single
> update
> > because assembling the `instanceEvents` can be extremely expensive. A
> query
> > that
> > could span more than a single update risks taking down the scheduler in a
> > large
> > cluster.
> >
> >
> > The problem I discovered is that there is no batch API to get the
> > inexpensive
> > information inside the `JobUpdate` struct. For reference this struct
> > contains:
> >
> > ````
> > /** Full definition of the job update. */
> > struct JobUpdate {
> >   /** Update summary. */
> >   1: JobUpdateSummary summary
> >
> >   /** Update configuration. */
> >   2: JobUpdateInstructions instructions
> > }
> > ````
> >
> > Consumers are forced to make several `getJobUpdateDetails` calls to get
> > multiple
> > `JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
> > assemble,
> > I'm proposing a new RPC that will allow consumers to get several
> `JobUpdate`
> > structs in a single call.
> >
> > ````
> > /** Gets job updates. */
> > Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
> > ````
> >
> > If there are no objections, I will file tickets and put up a patch to
> > implement
> > this.
> >
> > --
> > Zameer Manji
>

Re: [PROPOSAL] New RPC `fetchJobUpdates`

Posted by Maxim Khutornenko <ma...@apache.org>.
As I mentioned in Slack, I am ok with the new RPC as long as there is
a use for it elsewhere in the client or UI. Adding a read-only RPC
that isn't going to be called by our traditional integration test
clients sets a fertile ground for bit rot.

I am actually warming up to your original proposal of adding
JobUpdateQuery into the existing getJobUpdateDetails RPC. While it may
be more expensive to pull multiple updates, we don't necessarily risk
much after we migrated to MVStore on the H2 side. There are no table
locks acquired and the only downside would be pulling events along
with what you need. Provided the query is narrowly scoped, that should
deliver acceptable performance.

On Thu, Sep 1, 2016 at 2:24 PM, Zameer Manji <zm...@apache.org> wrote:
> Hey,
>
> I've noticed a hole in our current API which makes it difficult to write
> external clients and other tooling around fetching the state of updates.
>
> Currently, to fetch updates we are given two RPCs:
> ````
> /** Gets job update summaries. */
> Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)
>
> /** Gets job update details. */
> Response getJobUpdateDetails(1: JobUpdateKey key)
>
> ````
>
> The `getJobUpdateSummaries` RPC is not scoped to a single update and
> returns a
> set of `JobUpdateSummary` structs. The struct is defined:
> ````
> /** Summary of the job update including job key, user and current state. */
> struct JobUpdateSummary {
>   /** Unique identifier for the update. */
>   5: JobUpdateKey key
>
>   /** User initiated an update. */
>   3: string user
>
>   /** Current job update state. */
>   4: JobUpdateState state
> }
> ````
>
> The `getJobUpdateDetails` RPC is scoped to a single update and returns the
> following struct:
>
> ````
> struct JobUpdateDetails {
>   /** Update definition. */
>   1: JobUpdate update
>
>   /** History for this update. */
>   2: list<JobUpdateEvent> updateEvents
>
>   /** History for the individual instances updated. */
>   3: list<JobInstanceUpdateEvent> instanceEvents
> }
>
> ````
>
> Maxim mentioned to me yesterday that this RPC is scoped to a single update
> because assembling the `instanceEvents` can be extremely expensive. A query
> that
> could span more than a single update risks taking down the scheduler in a
> large
> cluster.
>
>
> The problem I discovered is that there is no batch API to get the
> inexpensive
> information inside the `JobUpdate` struct. For reference this struct
> contains:
>
> ````
> /** Full definition of the job update. */
> struct JobUpdate {
>   /** Update summary. */
>   1: JobUpdateSummary summary
>
>   /** Update configuration. */
>   2: JobUpdateInstructions instructions
> }
> ````
>
> Consumers are forced to make several `getJobUpdateDetails` calls to get
> multiple
> `JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
> assemble,
> I'm proposing a new RPC that will allow consumers to get several `JobUpdate`
> structs in a single call.
>
> ````
> /** Gets job updates. */
> Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
> ````
>
> If there are no objections, I will file tickets and put up a patch to
> implement
> this.
>
> --
> Zameer Manji