You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2016/05/12 21:29:24 UTC

[discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

We currently have three levels of interface annotation:

- unannotated: stable public API
- DeveloperApi: A lower-level, unstable API intended for developers.
- Experimental: An experimental user-facing API.


After using this annotation for ~ 2 years, I would like to propose the
following changes:

1. Require explicitly annotation for public APIs. This reduces the chance
of us accidentally exposing private APIs.

2. Separate interface annotation into two components: one that describes
intended audience, and the other that describes stability, similar to what
Hadoop does. This allows us to define "low level" APIs that are stable,
e.g. the data source API (I'd argue this is the API that should be more
stable than end-user-facing APIs).

InterfaceAudience: Public, Developer

InterfaceStability: Stable, Experimental


What do you think?

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Reynold Xin <rx...@databricks.com>.
Hm LimitedPrivate is not the intention. Those APIs (e.g. data source) are
by no means private. They are just lower level APIs whose intended audience
is library developers, not end users.

On Thu, May 12, 2016 at 8:32 PM, Sean Busbey <bu...@cloudera.com> wrote:

> We could switch to the Audience Annotation from Apache Yetus[1], and
> then rely on Public for end-users and LimitedPrivate for those things
> we intend as lower-level things with particular non-end-user
> audiences.
>
> [1]:
> http://yetus.apache.org/documentation/in-progress/#yetus-audience-annotations
>
> On Thu, May 12, 2016 at 3:35 PM, Reynold Xin <rx...@databricks.com> wrote:
> > That's true. I think I want to differentiate end-user vs developer.
> Public
> > isn't the best word. Maybe EndUser?
> >
> > On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman
> > <sh...@eecs.berkeley.edu> wrote:
> >>
> >> On Thu, May 12, 2016 at 2:29 PM, Reynold Xin <rx...@databricks.com>
> wrote:
> >> > We currently have three levels of interface annotation:
> >> >
> >> > - unannotated: stable public API
> >> > - DeveloperApi: A lower-level, unstable API intended for developers.
> >> > - Experimental: An experimental user-facing API.
> >> >
> >> >
> >> > After using this annotation for ~ 2 years, I would like to propose the
> >> > following changes:
> >> >
> >> > 1. Require explicitly annotation for public APIs. This reduces the
> >> > chance of
> >> > us accidentally exposing private APIs.
> >> >
> >> +1
> >>
> >> > 2. Separate interface annotation into two components: one that
> describes
> >> > intended audience, and the other that describes stability, similar to
> >> > what
> >> > Hadoop does. This allows us to define "low level" APIs that are
> stable,
> >> > e.g.
> >> > the data source API (I'd argue this is the API that should be more
> >> > stable
> >> > than end-user-facing APIs).
> >> >
> >> > InterfaceAudience: Public, Developer
> >> >
> >> > InterfaceStability: Stable, Experimental
> >> >
> >> I'm not very sure about this. What advantage do we get from Public vs.
> >> Developer ? Also somebody needs to take a judgement call on that which
> >> might not always be easy to do
> >> >
> >> > What do you think?
> >
> >
>
>
>
> --
> busbey
>

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Sean Busbey <bu...@cloudera.com>.
We could switch to the Audience Annotation from Apache Yetus[1], and
then rely on Public for end-users and LimitedPrivate for those things
we intend as lower-level things with particular non-end-user
audiences.

[1]: http://yetus.apache.org/documentation/in-progress/#yetus-audience-annotations

On Thu, May 12, 2016 at 3:35 PM, Reynold Xin <rx...@databricks.com> wrote:
> That's true. I think I want to differentiate end-user vs developer. Public
> isn't the best word. Maybe EndUser?
>
> On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman
> <sh...@eecs.berkeley.edu> wrote:
>>
>> On Thu, May 12, 2016 at 2:29 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > We currently have three levels of interface annotation:
>> >
>> > - unannotated: stable public API
>> > - DeveloperApi: A lower-level, unstable API intended for developers.
>> > - Experimental: An experimental user-facing API.
>> >
>> >
>> > After using this annotation for ~ 2 years, I would like to propose the
>> > following changes:
>> >
>> > 1. Require explicitly annotation for public APIs. This reduces the
>> > chance of
>> > us accidentally exposing private APIs.
>> >
>> +1
>>
>> > 2. Separate interface annotation into two components: one that describes
>> > intended audience, and the other that describes stability, similar to
>> > what
>> > Hadoop does. This allows us to define "low level" APIs that are stable,
>> > e.g.
>> > the data source API (I'd argue this is the API that should be more
>> > stable
>> > than end-user-facing APIs).
>> >
>> > InterfaceAudience: Public, Developer
>> >
>> > InterfaceStability: Stable, Experimental
>> >
>> I'm not very sure about this. What advantage do we get from Public vs.
>> Developer ? Also somebody needs to take a judgement call on that which
>> might not always be easy to do
>> >
>> > What do you think?
>
>



-- 
busbey

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Reynold Xin <rx...@databricks.com>.
That's true. I think I want to differentiate end-user vs developer. Public
isn't the best word. Maybe EndUser?

On Thu, May 12, 2016 at 3:34 PM, Shivaram Venkataraman <
shivaram@eecs.berkeley.edu> wrote:

> On Thu, May 12, 2016 at 2:29 PM, Reynold Xin <rx...@databricks.com> wrote:
> > We currently have three levels of interface annotation:
> >
> > - unannotated: stable public API
> > - DeveloperApi: A lower-level, unstable API intended for developers.
> > - Experimental: An experimental user-facing API.
> >
> >
> > After using this annotation for ~ 2 years, I would like to propose the
> > following changes:
> >
> > 1. Require explicitly annotation for public APIs. This reduces the
> chance of
> > us accidentally exposing private APIs.
> >
> +1
>
> > 2. Separate interface annotation into two components: one that describes
> > intended audience, and the other that describes stability, similar to
> what
> > Hadoop does. This allows us to define "low level" APIs that are stable,
> e.g.
> > the data source API (I'd argue this is the API that should be more stable
> > than end-user-facing APIs).
> >
> > InterfaceAudience: Public, Developer
> >
> > InterfaceStability: Stable, Experimental
> >
> I'm not very sure about this. What advantage do we get from Public vs.
> Developer ? Also somebody needs to take a judgement call on that which
> might not always be easy to do
> >
> > What do you think?
>

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.
On Thu, May 12, 2016 at 2:29 PM, Reynold Xin <rx...@databricks.com> wrote:
> We currently have three levels of interface annotation:
>
> - unannotated: stable public API
> - DeveloperApi: A lower-level, unstable API intended for developers.
> - Experimental: An experimental user-facing API.
>
>
> After using this annotation for ~ 2 years, I would like to propose the
> following changes:
>
> 1. Require explicitly annotation for public APIs. This reduces the chance of
> us accidentally exposing private APIs.
>
+1

> 2. Separate interface annotation into two components: one that describes
> intended audience, and the other that describes stability, similar to what
> Hadoop does. This allows us to define "low level" APIs that are stable, e.g.
> the data source API (I'd argue this is the API that should be more stable
> than end-user-facing APIs).
>
> InterfaceAudience: Public, Developer
>
> InterfaceStability: Stable, Experimental
>
I'm not very sure about this. What advantage do we get from Public vs.
Developer ? Also somebody needs to take a judgement call on that which
might not always be easy to do
>
> What do you think?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Reynold Xin <rx...@databricks.com>.
I think this is fairly important to do so I went ahead and created a PR for
the first mini step: https://github.com/apache/spark/pull/15374



On Wed, Aug 24, 2016 at 9:48 AM, Reynold Xin <rx...@databricks.com> wrote:

> Looks like I'm general people like it. Next step is for somebody to take
> the lead and implement it.
>
> Tom do you have cycles to do this?
>
>
> On Wednesday, August 24, 2016, Tom Graves <tg...@yahoo.com> wrote:
>
>> ping, did this discussion conclude or did we decide what we are doing?
>>
>> Tom
>>
>>
>> On Friday, May 13, 2016 3:19 PM, Michael Armbrust <mi...@databricks.com>
>> wrote:
>>
>>
>> +1 to the general structure of Reynold's proposal.  I've found what we do
>> currently a little confusing.  In particular, it doesn't make much sense
>> that @DeveloperApi things are always labeled as possibly changing.  For
>> example the Data Source API should arguably be one of the most stable
>> interfaces since its very difficult for users to recompile libraries that
>> might break when there are changes.
>>
>> For a similar reason, I don't really see the point of LimitedPrivate.
>> The goal here should be communication of promises of stability or future
>> stability.
>>
>> Regarding Developer vs. Public. I don't care too much about the naming,
>> but it does seem useful to differentiate APIs that we expect end users to
>> consume from those that are used to augment Spark. "Library" and
>> "Application" also seem reasonable.
>>
>> On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>>
>> On Fri, May 13, 2016 at 10:18 AM, Sean Busbey <bu...@cloudera.com>
>> wrote:
>> > I think LimitedPrivate gets a bad rap due to the way it is misused in
>> > Hadoop. The use case here -- "we offer this to developers of
>> > intermediate layers; those willing to update their software as we
>> > update ours"
>>
>> I think "LimitedPrivate" is a rather confusing name for that. I think
>> Reynold's first e-mail better matches that use case: this would be
>> "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)".
>>
>> But I don't really like "Developer" as a name here, because it's
>> ambiguous. Developer of what? Theoretically everybody writing Spark or
>> on top of its APIs is a developer. In that sense, I prefer using
>> something like "Library" and "Application" instead of "Developer" and
>> "Public".
>>
>> Personally, in fact, I don't see a lot of gain in differentiating
>> between the target users of an interface... knowing whether it's a
>> stable interface or not is a lot more useful. If you're equating a
>> "developer API" with "it's not really stable", then you don't really
>> need two annotations for that - just say it's not stable.
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>>
>>
>>

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Reynold Xin <rx...@databricks.com>.
Looks like I'm general people like it. Next step is for somebody to take
the lead and implement it.

Tom do you have cycles to do this?

On Wednesday, August 24, 2016, Tom Graves <tg...@yahoo.com> wrote:

> ping, did this discussion conclude or did we decide what we are doing?
>
> Tom
>
>
> On Friday, May 13, 2016 3:19 PM, Michael Armbrust <michael@databricks.com
> <javascript:_e(%7B%7D,'cvml','michael@databricks.com');>> wrote:
>
>
> +1 to the general structure of Reynold's proposal.  I've found what we do
> currently a little confusing.  In particular, it doesn't make much sense
> that @DeveloperApi things are always labeled as possibly changing.  For
> example the Data Source API should arguably be one of the most stable
> interfaces since its very difficult for users to recompile libraries that
> might break when there are changes.
>
> For a similar reason, I don't really see the point of LimitedPrivate.
> The goal here should be communication of promises of stability or future
> stability.
>
> Regarding Developer vs. Public. I don't care too much about the naming,
> but it does seem useful to differentiate APIs that we expect end users to
> consume from those that are used to augment Spark. "Library" and
> "Application" also seem reasonable.
>
> On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin <vanzin@cloudera.com
> <javascript:_e(%7B%7D,'cvml','vanzin@cloudera.com');>> wrote:
>
> On Fri, May 13, 2016 at 10:18 AM, Sean Busbey <busbey@cloudera.com
> <javascript:_e(%7B%7D,'cvml','busbey@cloudera.com');>> wrote:
> > I think LimitedPrivate gets a bad rap due to the way it is misused in
> > Hadoop. The use case here -- "we offer this to developers of
> > intermediate layers; those willing to update their software as we
> > update ours"
>
> I think "LimitedPrivate" is a rather confusing name for that. I think
> Reynold's first e-mail better matches that use case: this would be
> "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)".
>
> But I don't really like "Developer" as a name here, because it's
> ambiguous. Developer of what? Theoretically everybody writing Spark or
> on top of its APIs is a developer. In that sense, I prefer using
> something like "Library" and "Application" instead of "Developer" and
> "Public".
>
> Personally, in fact, I don't see a lot of gain in differentiating
> between the target users of an interface... knowing whether it's a
> stable interface or not is a lot more useful. If you're equating a
> "developer API" with "it's not really stable", then you don't really
> need two annotations for that - just say it's not stable.
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> <javascript:_e(%7B%7D,'cvml','dev-unsubscribe@spark.apache.org');>
> For additional commands, e-mail: dev-help@spark.apache.org
> <javascript:_e(%7B%7D,'cvml','dev-help@spark.apache.org');>
>
>
>
>
>

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Tom Graves <tg...@yahoo.com.INVALID>.
ping, did this discussion conclude or did we decide what we are doing?
Tom 

    On Friday, May 13, 2016 3:19 PM, Michael Armbrust <mi...@databricks.com> wrote:
 

 +1 to the general structure of Reynold's proposal.  I've found what we do currently a little confusing.  In particular, it doesn't make much sense that @DeveloperApi things are always labeled as possibly changing.  For example the Data Source API should arguably be one of the most stable interfaces since its very difficult for users to recompile libraries that might break when there are changes.
For a similar reason, I don't really see the point of LimitedPrivate.  The goal here should be communication of promises of stability or future stability.
Regarding Developer vs. Public. I don't care too much about the naming, but it does seem useful to differentiate APIs that we expect end users to consume from those that are used to augment Spark. "Library" and "Application" also seem reasonable.
On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin <va...@cloudera.com> wrote:

On Fri, May 13, 2016 at 10:18 AM, Sean Busbey <bu...@cloudera.com> wrote:
> I think LimitedPrivate gets a bad rap due to the way it is misused in
> Hadoop. The use case here -- "we offer this to developers of
> intermediate layers; those willing to update their software as we
> update ours"

I think "LimitedPrivate" is a rather confusing name for that. I think
Reynold's first e-mail better matches that use case: this would be
"InterfaceAudience(Developer)" and "InterfaceStability(Experimental)".

But I don't really like "Developer" as a name here, because it's
ambiguous. Developer of what? Theoretically everybody writing Spark or
on top of its APIs is a developer. In that sense, I prefer using
something like "Library" and "Application" instead of "Developer" and
"Public".

Personally, in fact, I don't see a lot of gain in differentiating
between the target users of an interface... knowing whether it's a
stable interface or not is a lot more useful. If you're equating a
"developer API" with "it's not really stable", then you don't really
need two annotations for that - just say it's not stable.

--
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org





  

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Michael Armbrust <mi...@databricks.com>.
+1 to the general structure of Reynold's proposal.  I've found what we do
currently a little confusing.  In particular, it doesn't make much sense
that @DeveloperApi things are always labeled as possibly changing.  For
example the Data Source API should arguably be one of the most stable
interfaces since its very difficult for users to recompile libraries that
might break when there are changes.

For a similar reason, I don't really see the point of LimitedPrivate.  The
goal here should be communication of promises of stability or future
stability.

Regarding Developer vs. Public. I don't care too much about the naming, but
it does seem useful to differentiate APIs that we expect end users to
consume from those that are used to augment Spark. "Library" and
"Application" also seem reasonable.

On Fri, May 13, 2016 at 11:15 AM, Marcelo Vanzin <va...@cloudera.com>
wrote:

> On Fri, May 13, 2016 at 10:18 AM, Sean Busbey <bu...@cloudera.com> wrote:
> > I think LimitedPrivate gets a bad rap due to the way it is misused in
> > Hadoop. The use case here -- "we offer this to developers of
> > intermediate layers; those willing to update their software as we
> > update ours"
>
> I think "LimitedPrivate" is a rather confusing name for that. I think
> Reynold's first e-mail better matches that use case: this would be
> "InterfaceAudience(Developer)" and "InterfaceStability(Experimental)".
>
> But I don't really like "Developer" as a name here, because it's
> ambiguous. Developer of what? Theoretically everybody writing Spark or
> on top of its APIs is a developer. In that sense, I prefer using
> something like "Library" and "Application" instead of "Developer" and
> "Public".
>
> Personally, in fact, I don't see a lot of gain in differentiating
> between the target users of an interface... knowing whether it's a
> stable interface or not is a lot more useful. If you're equating a
> "developer API" with "it's not really stable", then you don't really
> need two annotations for that - just say it's not stable.
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Marcelo Vanzin <va...@cloudera.com>.
On Fri, May 13, 2016 at 10:18 AM, Sean Busbey <bu...@cloudera.com> wrote:
> I think LimitedPrivate gets a bad rap due to the way it is misused in
> Hadoop. The use case here -- "we offer this to developers of
> intermediate layers; those willing to update their software as we
> update ours"

I think "LimitedPrivate" is a rather confusing name for that. I think
Reynold's first e-mail better matches that use case: this would be
"InterfaceAudience(Developer)" and "InterfaceStability(Experimental)".

But I don't really like "Developer" as a name here, because it's
ambiguous. Developer of what? Theoretically everybody writing Spark or
on top of its APIs is a developer. In that sense, I prefer using
something like "Library" and "Application" instead of "Developer" and
"Public".

Personally, in fact, I don't see a lot of gain in differentiating
between the target users of an interface... knowing whether it's a
stable interface or not is a lot more useful. If you're equating a
"developer API" with "it's not really stable", then you don't really
need two annotations for that - just say it's not stable.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Sean Busbey <bu...@cloudera.com>.
On Fri, May 13, 2016 at 6:37 AM, Tom Graves
<tg...@yahoo.com.invalid> wrote:
> So we definitely need to be careful here.  I know you didn't mention it but
> it mentioned by others so I would not recommend using LimitedPrivate.  I had
> started a discussion on Hadoop about some of this due to the way Spark
> needed to use some of the Api's.
> https://issues.apache.org/jira/browse/HADOOP-10506
>


I think LimitedPrivate gets a bad rap due to the way it is misused in
Hadoop. The use case here -- "we offer this to developers of
intermediate layers; those willing to update their software as we
update ours" -- is a perfectly acceptable distinction from the "this
is just for us" and "this is something folks can rely on enough to
contract out their software development". Essentially,
LimitedPrivate(LIBRARY) or LimitedPrivate(PORCELAIN) (to borrow from
git's distinction on interfaces for tool makers vs end users).



-- 
busbey

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Tom Graves <tg...@yahoo.com.INVALID>.
So we definitely need to be careful here.  I know you didn't mention it but it mentioned by others so I would not recommend using LimitedPrivate.  I had started a discussion on Hadoop about some of this due to the way Spark needed to use some of the Api's.https://issues.apache.org/jira/browse/HADOOP-10506

Overall it seems like a good idea, but we definitely need definitions with these and make sure they are clear to the end user looking at the code or docs.
I assume Developer really means to be used only within Spark? Developer is a pretty broad term which could mean end user developer or spark internal developer, etc.  Hadoop uses Private for this I think from an end user point of view PRIVATE is more obvious that they shouldn't be using it. So perhaps something other then Developer.  (INTERNAL, PROJECT_PRIVATE, etc.)
Tom
 

    On Thursday, May 12, 2016 4:29 PM, Reynold Xin <rx...@databricks.com> wrote:
 

 We currently have three levels of interface annotation:
- unannotated: stable public API- DeveloperApi: A lower-level, unstable API intended for developers.- Experimental: An experimental user-facing API.

After using this annotation for ~ 2 years, I would like to propose the following changes:
1. Require explicitly annotation for public APIs. This reduces the chance of us accidentally exposing private APIs.
2. Separate interface annotation into two components: one that describes intended audience, and the other that describes stability, similar to what Hadoop does. This allows us to define "low level" APIs that are stable, e.g. the data source API (I'd argue this is the API that should be more stable than end-user-facing APIs).
InterfaceAudience: Public, Developer
InterfaceStability: Stable, Experimental

What do you think?

  

Re: [discuss] separate API annotation into two components: InterfaceAudience & InterfaceStability

Posted by Steve Loughran <st...@hortonworks.com>.
> On 12 May 2016, at 22:29, Reynold Xin <rx...@databricks.com> wrote:
> 
> We currently have three levels of interface annotation:
> 
> - unannotated: stable public API
> - DeveloperApi: A lower-level, unstable API intended for developers.
> - Experimental: An experimental user-facing API.
> 
> 
> After using this annotation for ~ 2 years, I would like to propose the following changes:
> 
> 1. Require explicitly annotation for public APIs. This reduces the chance of us accidentally exposing private APIs.

+1

> 
> 2. Separate interface annotation into two components: one that describes intended audience, and the other that describes stability, similar to what Hadoop does. This allows us to define "low level" APIs that are stable, e.g. the data source API (I'd argue this is the API that should be more stable than end-user-facing APIs).
> 
> InterfaceAudience: Public, Developer
> 
> InterfaceStability: Stable, Experimental
> 
> 
> What do you think?


you should know there's a bit of a "discussion" in Hadoop right now about what "LimitedPrivate" means, that is: things marked "LimitedPrivate(MapReduce)" are pretty much universally used in YARN apps, and other things tagged as private (UGI) are so universal that its meaningless. That is: even if you tag up something as Developer, it may end up being used so widely that it becomes public. The hard part then becomes recognising which classes and methods have such a use, which ends up needing an IDE with everything loaded in.

Java 9 is going to open up a lot more in terms of modularization, though i don't know what that will mean for scala. For Java projects, it may allow isolation to be more explicit

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org