You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Wenchen Fan <cl...@gmail.com> on 2018/11/14 02:25:39 UTC

which classes/methods are considered as private in Spark?

Hi all,

Recently I updated the MiMa exclusion rules, and found MiMa tracks some
private classes/methods unexpectedly.

Note that, "private" here means that, we have no guarantee about
compatibility. We don't provide documents and users need to take the risk
when using them.

In the API document, it has some obvious private classes, e.g.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.serializer.DummySerializerInstance
, which is not expected either.

I looked around and can't find a clear definition of "private" in Spark.

AFAIK, we have several rules:
1. everything which is really private that end users can't access, e.g.
package private classes, private methods, etc.
2. classes under certain packages. I don't know if we have a list, the
catalyst package is considered as a private package.
3. everything which has a @Private annotation.

I'm sending this email to collect more feedback, and hope we can come up
with a clear definition about what is "private".

Thanks,
Wenchen

Re: which classes/methods are considered as private in Spark?

Posted by Sean Owen <sr...@gmail.com>.
You should find that 'surprisingly public' classes are there because
of language technicalities. For example DummySerializerInstance is
public because it's a Java class, and can't be used outside its
package otherwise.

LIkewise I think MiMa just looks at bytecode, and private[spark]
classes are public in the bytecode for similar reasons (although Scala
enforces the access within Scala as expected). Hence it will flag
changes to "nonpublic" private[spark] classes.

I think things that are meant to be marked private are, well, marked
private, or else as private as possible and flagged with annotations
like @Private. (It does sound like DummySerializerInstance should be
so annotated?) Yes, the catalyst package in its entirety is one big
exception - private by fiat, not by painstaking flagging of every
class.

The issue to me is really docs. If we have java/scaladoc of private
classes, and there's a way to avoid that like with annotations, that
should be fixed.
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <cl...@gmail.com> wrote:
>
> Hi all,
>
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.
>
> Note that, "private" here means that, we have no guarantee about compatibility. We don't provide documents and users need to take the risk when using them.
>
> In the API document, it has some obvious private classes, e.g. https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.serializer.DummySerializerInstance , which is not expected either.
>
> I looked around and can't find a clear definition of "private" in Spark.
>
> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.
>
> I'm sending this email to collect more feedback, and hope we can come up with a clear definition about what is "private".
>
> Thanks,
> Wenchen

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: which classes/methods are considered as private in Spark?

Posted by Reynold Xin <rx...@databricks.com>.
I used to, before each release during the RC phase, go through every single doc page to make sure we don’t unintentionally leave things public. I no longer have time to do that unfortunately. I find that very useful because I always catch some mistakes through organic development.

> On Nov 13, 2018, at 8:00 PM, Wenchen Fan <cl...@gmail.com> wrote:
> 
> > Could you clarify what you mean here? Mima has some known limitations such as not handling "private[blah]" very well
> 
> Yes that's what I mean.
> 
> What I want to know here is, which classes/methods we expect them to be private. I think things marked as "private[blabla]" are expected to be private for sure, it's just the MiMa and doc generator can't handle it well. We can fix them later, by using the @Private annotation probably.
> 
> > seems like it's tracked by a bunch of exclusions in the Unidoc object
> 
> That's good. At least we have a clear definition about which packages are meant to be private. We should make it consistent between MiMa and doc generator though.
> 
>> On Wed, Nov 14, 2018 at 10:41 AM Marcelo Vanzin <va...@cloudera.com> wrote:
>> On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <cl...@gmail.com> wrote:
>> > Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.
>> 
>> Could you clarify what you mean here? Mima has some known limitations
>> such as not handling "private[blah]" very well (because that means
>> public in Java). Spark has (had?) this tool to generate an exclusions
>> file for Mima, but not sure how up-to-date it is.
>> 
>> > AFAIK, we have several rules:
>> > 1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
>> > 2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
>> > 3. everything which has a @Private annotation.
>> 
>> That's my understanding of the scope of the rules.
>> 
>> (2) to me means "things that show up in the public API docs". That's,
>> AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
>> of exclusions in the Unidoc object (I remember that being different in
>> the past).
>> 
>> (3) might be a limitation of the doc generation tool? Not sure if it's
>> easy to say "do not document classes that have @Private". At the very
>> least, that annotation seems to be missing the "@Documented"
>> annotation, which would make that info present in the javadoc. I do
>> not know if the scala doc tool handles that.
>> 
>> -- 
>> Marcelo

Re: which classes/methods are considered as private in Spark?

Posted by Wenchen Fan <cl...@gmail.com>.
> Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well

Yes that's what I mean.

What I want to know here is, which classes/methods we expect them to be
private. I think things marked as "private[blabla]" are expected to be
private for sure, it's just the MiMa and doc generator can't handle it
well. We can fix them later, by using the @Private annotation probably.

> seems like it's tracked by a bunch of exclusions in the Unidoc object

That's good. At least we have a clear definition about which packages are
meant to be private. We should make it consistent between MiMa and doc
generator though.

On Wed, Nov 14, 2018 at 10:41 AM Marcelo Vanzin <va...@cloudera.com> wrote:

> On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <cl...@gmail.com> wrote:
> > Recently I updated the MiMa exclusion rules, and found MiMa tracks some
> private classes/methods unexpectedly.
>
> Could you clarify what you mean here? Mima has some known limitations
> such as not handling "private[blah]" very well (because that means
> public in Java). Spark has (had?) this tool to generate an exclusions
> file for Mima, but not sure how up-to-date it is.
>
> > AFAIK, we have several rules:
> > 1. everything which is really private that end users can't access, e.g.
> package private classes, private methods, etc.
> > 2. classes under certain packages. I don't know if we have a list, the
> catalyst package is considered as a private package.
> > 3. everything which has a @Private annotation.
>
> That's my understanding of the scope of the rules.
>
> (2) to me means "things that show up in the public API docs". That's,
> AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
> of exclusions in the Unidoc object (I remember that being different in
> the past).
>
> (3) might be a limitation of the doc generation tool? Not sure if it's
> easy to say "do not document classes that have @Private". At the very
> least, that annotation seems to be missing the "@Documented"
> annotation, which would make that info present in the javadoc. I do
> not know if the scala doc tool handles that.
>
> --
> Marcelo
>

Re: which classes/methods are considered as private in Spark?

Posted by Marcelo Vanzin <va...@cloudera.com.INVALID>.
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan <cl...@gmail.com> wrote:
> Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly.

Could you clarify what you mean here? Mima has some known limitations
such as not handling "private[blah]" very well (because that means
public in Java). Spark has (had?) this tool to generate an exclusions
file for Mima, but not sure how up-to-date it is.

> AFAIK, we have several rules:
> 1. everything which is really private that end users can't access, e.g. package private classes, private methods, etc.
> 2. classes under certain packages. I don't know if we have a list, the catalyst package is considered as a private package.
> 3. everything which has a @Private annotation.

That's my understanding of the scope of the rules.

(2) to me means "things that show up in the public API docs". That's,
AFAIK, tracked in SparkBuild.scala; seems like it's tracked by a bunch
of exclusions in the Unidoc object (I remember that being different in
the past).

(3) might be a limitation of the doc generation tool? Not sure if it's
easy to say "do not document classes that have @Private". At the very
least, that annotation seems to be missing the "@Documented"
annotation, which would make that info present in the javadoc. I do
not know if the scala doc tool handles that.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org