You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Kenneth Knowles <ke...@apache.org> on 2020/03/06 22:54:38 UTC

Re: [DISCUSS] @Experimental annotations - processes and alternatives

OK I tried to make a tiny bit of progress on this, with `grep --ignore-case
--line-number --recursive '@experimental' .` there are 578 occurrences
(includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc
-l` there are 377 distinct code files.

So that's a big project but easily scales to the contributors. I suggest we
need to crowdsource a bit.

I created
https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
where you can suggest/comment adding your name to a file to volunteer to
own going through the file.

I have not checked git history to try to find owners.

Kenn

On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <ar...@gmail.com>
wrote:

> Thank you Kenn for starting this discussion.
>
> As I see, for now, the main goal for “@Experimental" annotation is to
> relive and be useful in the sense as it’s name says (this is obviously not
> a case for the moment). I'd suggest a bit more simplified scenario for this:
>
> 1. We do a revision of all “@Experimental" annotation uses now. For the
> code (IOs/libs/etc) that we 100% know that has been used in production for
> a long time with current stable API, we just take this annotation away
> since it’s no needed anymore.
>
> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait
> for N releases (N=3 ?) and then take it away if there are no breaking
> changes happened. We may want to add new argument for “@Experimental” to
> keep track release number when it was added.
>
> 3. We would need to have a regular “Experimental annotation report” (like
> we have for dependencies) sending to dev@ and it will allow us to track
> new and out-dated annotation.
>
> 4. And on course we update contributors documentation about that.
>
> Idea of graduation by voting seems a bit complicated - for me it means
> that all added new user APIs should go through this process and I’m afraid
> that in the end, we potentially can be overwhelmed with number of such
> polls. I think that several releases of maturation and final decision of
> the person(2) responsible for the component should be enough.
>
> In the same time, I like the Andrew’s idea about checking a breaking
> changes through external tool. So, it could guarantee us to to remove
> experimental state without any fear to break API.
>
> In case of breaking changes of stable API, that won’t be possible to
> avoid, we still can use @Deprecated and wait for 3 release to remove (as we
> already did before). So, having up-to-date @Experimental and  @Deprecated
>  annotations won’t be confusing for users.
>
>
>
>
>
> On 28 Nov 2019, at 04:48, Kenneth Knowles <ke...@apache.org> wrote:
>
>
>
> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold <el...@ibiblio.org>
> wrote:
>
>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles <ke...@apache.org> wrote:
>> >
>>
>> > *Opt-in*: This is a powerful idea that I think changes everything.
>> >    - for an experimental new IO, a separate artifact; this way we can
>> also see downloads
>> >    - for experimental code fragments, add checkState that the relevant
>> experiment is turned on via flags
>>
>> To be clear the experimental artifact would have the same group ID and
>> artifact ID but a different version than the non-experimental
>> artifacts?  E.g.
>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
>>
>> That could work. Changing the artifact ID or the package name would
>> risk split package issues and diamond dependency problems. We'd still
>> need to be careful about mixing experimental and non-experimental
>> artifacts.
>>
>
> That's clever! I think using the classifier might be better than a
> modified version number, e.g.
> org.apache.beam:beam-io-mydb:2.4.0:experimental
>
> My prior idea was much less clever: for any version 2.X there would either
> be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no
> problem with a split package. There would be no "same artifact id" concern.
>
> Your idea would allow us to ship two variants of the library, if we
> developed the tooling for it. I think doing the stripping of experimental
> bits and ensuring they both compile might be tricky unless we are stripping
> rather disjoint piece of the library.
>
> Kenn
>
>
>

Re: [DISCUSS] @Experimental annotations - processes and alternatives

Posted by Kenneth Knowles <ke...@apache.org>.
I have always been unsure if the "kind" is useful. For users, I doubt it is
useful. Now I see that it can be helpful to find all the occurrences by
using an IDE to find references when you want to delete them. Many of the
"kind" enums are not really categories that you can delete all at once.
Like "SOURCE_SINK" is so generic that you cannot treat them as a group. I
would really just group by artifact first, then package/directory, then
file. Some judgment can be applied and I think it will not be very
difficult.

Kenn

On Mon, Mar 9, 2020 at 12:01 PM Alexey Romanenko <ar...@gmail.com>
wrote:

> Thanks Kenn for moving this forward.
>
> Though, what still buzzes me is - do we have a consensus about what we
> actually do with different type of annotations?
> Can we say, for example, that “
> @Experimental(Experimental.Kind.SOURCE_SINK)” is useless and we can get
> rid of it easily? Either, since Schema API is still under development than “
> @Experimental(Kind.SCHEMAS)” is required everywhere where Schema is used
> in public API? And so on…
> Does it make sense to split down this list by types of experimental annotation
> and the final decision for every type will be dependent on this?
>
> On 9 Mar 2020, at 04:39, Kenneth Knowles <ke...@apache.org> wrote:
>
> On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Kenn can you adjust the script to match only source code files
>> ... otherwise it produces a lot of extra false positives
>
>
> I think the sheet only had false matches in build/ directories. Removed.
> Can you comment on other cells that look like a new class of false
> positives?
>
>
>> Also can we extract the full annotation as a column so we can
>> filter/group for the full kind (type) of the experimental annotation e.g.
>> @Experimental(Kind.SCHEMAS),
>>
>
> This was already done. It is column D. It maybe is off the side of the
> screen for you?
>
>
>> we agreed with Luke Cwik was to remove the Experimental annotations from
>> ‘runners/core*’
>>
>
> Make sense; this was never end-user facing.
>
>
>> It is probably worth to re run the script against the latest master
>> because results in the spreadsheet do not correspond with the current
>> master.
>
>
> Hmmm. I just checked and the directory I ran it in is has detached
> github/master checked out. So it might be a little stale, but not much.
> Since people started to sign up it is a shame to reset the sheet. Probably
> the files are still worth looking at, even if the line numbers don't match,
> and if it was already processed that is an easy case.
>
>
>> We also introduced package level Experimental annotations
>> (package-info.java) so
>> this can easily count for 50 duplicates that should probably be trimmed
>> for the
>> same person who is covering the corresponding files in the package. With
>> all
>> these adjustments we will be easily below 250 matches.
>>
>
> I agree that it is efficient, but I worry that package level experimental
> is basically invisible to users. Since I sorted by filename it should be
> easy to write your name once and then drag it to a whole set of files?
> Really we mostly only care about "what file, and which KIND annotations are
> present". I just made a new tab with that info, but it did not gather all
> the different annotations that may be in the file.
>
> Kenn
>
>
>> Regards,
>> Ismaël
>>
>> [1]
>> https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E
>>
>>
>>
>> On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles <ke...@apache.org> wrote:
>> >
>> > OK I tried to make a tiny bit of progress on this, with `grep
>> --ignore-case --line-number --recursive '@experimental' .` there are 578
>> occurrences (includes website and comments). Via `| cut -d ':' -f 1 | sort
>> | uniq | wc -l` there are 377 distinct code files.
>> >
>> > So that's a big project but easily scales to the contributors. I
>> suggest we need to crowdsource a bit.
>> >
>> > I created
>> https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
>> where you can suggest/comment adding your name to a file to volunteer to
>> own going through the file.
>> >
>> > I have not checked git history to try to find owners.
>> >
>> > Kenn
>> >
>> > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <
>> aromanenko.dev@gmail.com> wrote:
>> >>
>> >> Thank you Kenn for starting this discussion.
>> >>
>> >> As I see, for now, the main goal for “@Experimental" annotation is to
>> relive and be useful in the sense as it’s name says (this is obviously not
>> a case for the moment). I'd suggest a bit more simplified scenario for this:
>> >>
>> >> 1. We do a revision of all “@Experimental" annotation uses now. For
>> the code (IOs/libs/etc) that we 100% know that has been used in production
>> for a long time with current stable API, we just take this annotation away
>> since it’s no needed anymore.
>> >>
>> >> 2. For the code, that is left after p.1, we leave as “@Experimental”,
>> wait for N releases (N=3 ?) and then take it away if there are no breaking
>> changes happened. We may want to add new argument for “@Experimental” to
>> keep track release number when it was added.
>> >>
>> >> 3. We would need to have a regular “Experimental annotation report”
>> (like we have for dependencies) sending to dev@ and it will allow us to
>> track new and out-dated annotation.
>> >>
>> >> 4. And on course we update contributors documentation about that.
>> >>
>> >> Idea of graduation by voting seems a bit complicated - for me it means
>> that all added new user APIs should go through this process and I’m afraid
>> that in the end, we potentially can be overwhelmed with number of such
>> polls. I think that several releases of maturation and final decision of
>> the person(2) responsible for the component should be enough.
>> >>
>> >> In the same time, I like the Andrew’s idea about checking a breaking
>> changes through external tool. So, it could guarantee us to to remove
>> experimental state without any fear to break API.
>> >>
>> >> In case of breaking changes of stable API, that won’t be possible to
>> avoid, we still can use @Deprecated and wait for 3 release to remove (as we
>> already did before). So, having up-to-date @Experimental and  @Deprecated
>> annotations won’t be confusing for users.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 28 Nov 2019, at 04:48, Kenneth Knowles <ke...@apache.org> wrote:
>> >>
>> >>
>> >>
>> >> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold <
>> elharo@ibiblio.org> wrote:
>> >>>
>> >>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles <ke...@apache.org>
>> wrote:
>> >>> >
>> >>>
>> >>> > *Opt-in*: This is a powerful idea that I think changes everything.
>> >>> >    - for an experimental new IO, a separate artifact; this way we
>> can also see downloads
>> >>> >    - for experimental code fragments, add checkState that the
>> relevant experiment is turned on via flags
>> >>>
>> >>> To be clear the experimental artifact would have the same group ID and
>> >>> artifact ID but a different version than the non-experimental
>> >>> artifacts?  E.g.
>> >>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
>> >>>
>> >>> That could work. Changing the artifact ID or the package name would
>> >>> risk split package issues and diamond dependency problems. We'd still
>> >>> need to be careful about mixing experimental and non-experimental
>> >>> artifacts.
>> >>
>> >>
>> >> That's clever! I think using the classifier might be better than a
>> modified version number, e.g.
>> org.apache.beam:beam-io-mydb:2.4.0:experimental
>> >>
>> >> My prior idea was much less clever: for any version 2.X there would
>> either be beam-io-mydb-experimental or beam-io-mydb (after graduation) so
>> no problem with a split package. There would be no "same artifact id"
>> concern.
>> >>
>> >> Your idea would allow us to ship two variants of the library, if we
>> developed the tooling for it. I think doing the stripping of experimental
>> bits and ensuring they both compile might be tricky unless we are stripping
>> rather disjoint piece of the library.
>> >>
>> >> Kenn
>> >>
>> >>
>>
>
>

Re: [DISCUSS] @Experimental annotations - processes and alternatives

Posted by Alexey Romanenko <ar...@gmail.com>.
Thanks Kenn for moving this forward. 

Though, what still buzzes me is - do we have a consensus about what we actually do with different type of annotations? 
Can we say, for example, that “@Experimental(Experimental.Kind.SOURCE_SINK)” is useless and we can get rid of it easily? Either, since Schema API is still under development than “@Experimental(Kind.SCHEMAS)” is required everywhere where Schema is used in public API? And so on…
Does it make sense to split down this list by types of experimental annotation and the final decision for every type will be dependent on this?

> On 9 Mar 2020, at 04:39, Kenneth Knowles <ke...@apache.org> wrote:
> 
> On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía <iemejia@gmail.com <ma...@gmail.com>> wrote:
> Kenn can you adjust the script to match only source code files ... otherwise it produces a lot of extra false positives
> 
> I think the sheet only had false matches in build/ directories. Removed. Can you comment on other cells that look like a new class of false positives?
>  
> Also can we extract the full annotation as a column so we can filter/group for the full kind (type) of the experimental annotation e.g. @Experimental(Kind.SCHEMAS),
> 
> This was already done. It is column D. It maybe is off the side of the screen for you?
>   
> we agreed with Luke Cwik was to remove the Experimental annotations from ‘runners/core*’
> 
> Make sense; this was never end-user facing.
>  
> It is probably worth to re run the script against the latest master because results in the spreadsheet do not correspond with the current master.
> 
> Hmmm. I just checked and the directory I ran it in is has detached github/master checked out. So it might be a little stale, but not much. Since people started to sign up it is a shame to reset the sheet. Probably the files are still worth looking at, even if the line numbers don't match, and if it was already processed that is an easy case.
>  
> We also introduced package level Experimental annotations (package-info.java) so
> this can easily count for 50 duplicates that should probably be trimmed for the
> same person who is covering the corresponding files in the package. With all
> these adjustments we will be easily below 250 matches.
> 
> I agree that it is efficient, but I worry that package level experimental is basically invisible to users. Since I sorted by filename it should be easy to write your name once and then drag it to a whole set of files? Really we mostly only care about "what file, and which KIND annotations are present". I just made a new tab with that info, but it did not gather all the different annotations that may be in the file.
> 
> Kenn
> 
> 
> Regards,
> Ismaël
> 
> [1] https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E <https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E>
> 
> 
> 
> On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
> >
> > OK I tried to make a tiny bit of progress on this, with `grep --ignore-case --line-number --recursive '@experimental' .` there are 578 occurrences (includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc -l` there are 377 distinct code files.
> >
> > So that's a big project but easily scales to the contributors. I suggest we need to crowdsource a bit.
> >
> > I created https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing <https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing> where you can suggest/comment adding your name to a file to volunteer to own going through the file.
> >
> > I have not checked git history to try to find owners.
> >
> > Kenn
> >
> > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <aromanenko.dev@gmail.com <ma...@gmail.com>> wrote:
> >>
> >> Thank you Kenn for starting this discussion.
> >>
> >> As I see, for now, the main goal for “@Experimental" annotation is to relive and be useful in the sense as it’s name says (this is obviously not a case for the moment). I'd suggest a bit more simplified scenario for this:
> >>
> >> 1. We do a revision of all “@Experimental" annotation uses now. For the code (IOs/libs/etc) that we 100% know that has been used in production for a long time with current stable API, we just take this annotation away since it’s no needed anymore.
> >>
> >> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait for N releases (N=3 ?) and then take it away if there are no breaking changes happened. We may want to add new argument for “@Experimental” to keep track release number when it was added.
> >>
> >> 3. We would need to have a regular “Experimental annotation report” (like we have for dependencies) sending to dev@ and it will allow us to track new and out-dated annotation.
> >>
> >> 4. And on course we update contributors documentation about that.
> >>
> >> Idea of graduation by voting seems a bit complicated - for me it means that all added new user APIs should go through this process and I’m afraid that in the end, we potentially can be overwhelmed with number of such polls. I think that several releases of maturation and final decision of the person(2) responsible for the component should be enough.
> >>
> >> In the same time, I like the Andrew’s idea about checking a breaking changes through external tool. So, it could guarantee us to to remove experimental state without any fear to break API.
> >>
> >> In case of breaking changes of stable API, that won’t be possible to avoid, we still can use @Deprecated and wait for 3 release to remove (as we already did before). So, having up-to-date @Experimental and  @Deprecated  annotations won’t be confusing for users.
> >>
> >>
> >>
> >>
> >>
> >> On 28 Nov 2019, at 04:48, Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
> >>
> >>
> >>
> >> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold <elharo@ibiblio.org <ma...@ibiblio.org>> wrote:
> >>>
> >>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles <kenn@apache.org <ma...@apache.org>> wrote:
> >>> >
> >>>
> >>> > *Opt-in*: This is a powerful idea that I think changes everything.
> >>> >    - for an experimental new IO, a separate artifact; this way we can also see downloads
> >>> >    - for experimental code fragments, add checkState that the relevant experiment is turned on via flags
> >>>
> >>> To be clear the experimental artifact would have the same group ID and
> >>> artifact ID but a different version than the non-experimental
> >>> artifacts?  E.g.
> >>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
> >>>
> >>> That could work. Changing the artifact ID or the package name would
> >>> risk split package issues and diamond dependency problems. We'd still
> >>> need to be careful about mixing experimental and non-experimental
> >>> artifacts.
> >>
> >>
> >> That's clever! I think using the classifier might be better than a modified version number, e.g. org.apache.beam:beam-io-mydb:2.4.0:experimental
> >>
> >> My prior idea was much less clever: for any version 2.X there would either be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no problem with a split package. There would be no "same artifact id" concern.
> >>
> >> Your idea would allow us to ship two variants of the library, if we developed the tooling for it. I think doing the stripping of experimental bits and ensuring they both compile might be tricky unless we are stripping rather disjoint piece of the library.
> >>
> >> Kenn
> >>
> >>


Re: [DISCUSS] @Experimental annotations - processes and alternatives

Posted by Kenneth Knowles <ke...@apache.org>.
On Sun, Mar 8, 2020 at 1:55 PM Ismaël Mejía <ie...@gmail.com> wrote:

> Kenn can you adjust the script to match only source code files
> ... otherwise it produces a lot of extra false positives


I think the sheet only had false matches in build/ directories. Removed.
Can you comment on other cells that look like a new class of false
positives?


> Also can we extract the full annotation as a column so we can filter/group
> for the full kind (type) of the experimental annotation e.g.
> @Experimental(Kind.SCHEMAS),
>

This was already done. It is column D. It maybe is off the side of the
screen for you?


> we agreed with Luke Cwik was to remove the Experimental annotations from
> ‘runners/core*’
>

Make sense; this was never end-user facing.


> It is probably worth to re run the script against the latest master
> because results in the spreadsheet do not correspond with the current
> master.


Hmmm. I just checked and the directory I ran it in is has detached
github/master checked out. So it might be a little stale, but not much.
Since people started to sign up it is a shame to reset the sheet. Probably
the files are still worth looking at, even if the line numbers don't match,
and if it was already processed that is an easy case.


> We also introduced package level Experimental annotations
> (package-info.java) so
> this can easily count for 50 duplicates that should probably be trimmed
> for the
> same person who is covering the corresponding files in the package. With
> all
> these adjustments we will be easily below 250 matches.
>

I agree that it is efficient, but I worry that package level experimental
is basically invisible to users. Since I sorted by filename it should be
easy to write your name once and then drag it to a whole set of files?
Really we mostly only care about "what file, and which KIND annotations are
present". I just made a new tab with that info, but it did not gather all
the different annotations that may be in the file.

Kenn


> Regards,
> Ismaël
>
> [1]
> https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E
>
>
>
> On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles <ke...@apache.org> wrote:
> >
> > OK I tried to make a tiny bit of progress on this, with `grep
> --ignore-case --line-number --recursive '@experimental' .` there are 578
> occurrences (includes website and comments). Via `| cut -d ':' -f 1 | sort
> | uniq | wc -l` there are 377 distinct code files.
> >
> > So that's a big project but easily scales to the contributors. I suggest
> we need to crowdsource a bit.
> >
> > I created
> https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing
> where you can suggest/comment adding your name to a file to volunteer to
> own going through the file.
> >
> > I have not checked git history to try to find owners.
> >
> > Kenn
> >
> > On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <
> aromanenko.dev@gmail.com> wrote:
> >>
> >> Thank you Kenn for starting this discussion.
> >>
> >> As I see, for now, the main goal for “@Experimental" annotation is to
> relive and be useful in the sense as it’s name says (this is obviously not
> a case for the moment). I'd suggest a bit more simplified scenario for this:
> >>
> >> 1. We do a revision of all “@Experimental" annotation uses now. For the
> code (IOs/libs/etc) that we 100% know that has been used in production for
> a long time with current stable API, we just take this annotation away
> since it’s no needed anymore.
> >>
> >> 2. For the code, that is left after p.1, we leave as “@Experimental”,
> wait for N releases (N=3 ?) and then take it away if there are no breaking
> changes happened. We may want to add new argument for “@Experimental” to
> keep track release number when it was added.
> >>
> >> 3. We would need to have a regular “Experimental annotation report”
> (like we have for dependencies) sending to dev@ and it will allow us to
> track new and out-dated annotation.
> >>
> >> 4. And on course we update contributors documentation about that.
> >>
> >> Idea of graduation by voting seems a bit complicated - for me it means
> that all added new user APIs should go through this process and I’m afraid
> that in the end, we potentially can be overwhelmed with number of such
> polls. I think that several releases of maturation and final decision of
> the person(2) responsible for the component should be enough.
> >>
> >> In the same time, I like the Andrew’s idea about checking a breaking
> changes through external tool. So, it could guarantee us to to remove
> experimental state without any fear to break API.
> >>
> >> In case of breaking changes of stable API, that won’t be possible to
> avoid, we still can use @Deprecated and wait for 3 release to remove (as we
> already did before). So, having up-to-date @Experimental and  @Deprecated
> annotations won’t be confusing for users.
> >>
> >>
> >>
> >>
> >>
> >> On 28 Nov 2019, at 04:48, Kenneth Knowles <ke...@apache.org> wrote:
> >>
> >>
> >>
> >> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold <
> elharo@ibiblio.org> wrote:
> >>>
> >>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles <ke...@apache.org>
> wrote:
> >>> >
> >>>
> >>> > *Opt-in*: This is a powerful idea that I think changes everything.
> >>> >    - for an experimental new IO, a separate artifact; this way we
> can also see downloads
> >>> >    - for experimental code fragments, add checkState that the
> relevant experiment is turned on via flags
> >>>
> >>> To be clear the experimental artifact would have the same group ID and
> >>> artifact ID but a different version than the non-experimental
> >>> artifacts?  E.g.
> >>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
> >>>
> >>> That could work. Changing the artifact ID or the package name would
> >>> risk split package issues and diamond dependency problems. We'd still
> >>> need to be careful about mixing experimental and non-experimental
> >>> artifacts.
> >>
> >>
> >> That's clever! I think using the classifier might be better than a
> modified version number, e.g.
> org.apache.beam:beam-io-mydb:2.4.0:experimental
> >>
> >> My prior idea was much less clever: for any version 2.X there would
> either be beam-io-mydb-experimental or beam-io-mydb (after graduation) so
> no problem with a split package. There would be no "same artifact id"
> concern.
> >>
> >> Your idea would allow us to ship two variants of the library, if we
> developed the tooling for it. I think doing the stripping of experimental
> bits and ensuring they both compile might be tricky unless we are stripping
> rather disjoint piece of the library.
> >>
> >> Kenn
> >>
> >>
>

Re: [DISCUSS] @Experimental annotations - processes and alternatives

Posted by Ismaël Mejía <ie...@gmail.com>.
Kenn can you adjust the script to match only source code files: `--include
\*.java --include \*.py --include \*.go` otherwise it produces a lot of extra
false positives due to html files and cache files.  Also can we extract the full
annotation as a column so we can filter/group for the full kind (type) of the
experimental annotation e.g. @Experimental(Kind.SCHEMAS),
@Experimental(Kind.SOURCE_SINK), etc.

This way we can group occurrences per kind and quickly triage some of them which
are still clearly still experimental (and with ongoing independent stabilization
efforts [1]) like these:
@Experimental(Kind.SCHEMAS)
@Experimental(Kind.SPLITTABLE_DO_FN)
@Experimental(Kind.PORTABILITY)
(and probably @Experimental(Kind.CONTEXTFUL)

I have been going in the last weeks adjusting the Experimental annotations to
follow the @Experimental(Kind.FOO) pattern thinking about this future triage so
good to see the effort may pay :) As part of this work one idea we agreed with
Luke Cwik was to remove the Experimental annotations from ‘runners/core*’
because historically Beam has not had strong compatibility guarantees for users
of these APIs (runner authors). It is probably worth to re run the script
against the latest master because results in the spreadsheet do not correspond
with the current master. (Note that the remaining External class is still tagged
as Experimental because it is still pending to move it into ‘sdks/java/core’).

Not related to Experimental but worth mentioning is that we also
started tagging:
sdks/java/core/src/main/java/org/apache/beam/sdk/util/*
sdks/java/core/src/main/java/org/apache/beam/sdk/testing/*
as @Internal for the same reasons, classes in both packages are basically for
Internal use on Beam SDK Harness, for runner authors and for tests. And pipeline
authors should not be relying on their stability.

We also introduced package level Experimental annotations (package-info.java) so
this can easily count for 50 duplicates that should probably be trimmed for the
same person who is covering the corresponding files in the package. With all
these adjustments we will be easily below 250 matches.

Regards,
Ismaël

[1] https://lists.apache.org/thread.html/r73d3b19506ea435ee6be568ccc32065e36cd873dbbcf2a3e9049254e%40%3Cdev.beam.apache.org%3E



On Fri, Mar 6, 2020 at 11:54 PM Kenneth Knowles <ke...@apache.org> wrote:
>
> OK I tried to make a tiny bit of progress on this, with `grep --ignore-case --line-number --recursive '@experimental' .` there are 578 occurrences (includes website and comments). Via `| cut -d ':' -f 1 | sort | uniq | wc -l` there are 377 distinct code files.
>
> So that's a big project but easily scales to the contributors. I suggest we need to crowdsource a bit.
>
> I created https://docs.google.com/spreadsheets/d/1T98I7tFoUgwW2tegS5xbNRjaVDvZiBBLn7jg0Ef_IwU/edit?usp=sharing where you can suggest/comment adding your name to a file to volunteer to own going through the file.
>
> I have not checked git history to try to find owners.
>
> Kenn
>
> On Mon, Dec 2, 2019 at 10:26 AM Alexey Romanenko <ar...@gmail.com> wrote:
>>
>> Thank you Kenn for starting this discussion.
>>
>> As I see, for now, the main goal for “@Experimental" annotation is to relive and be useful in the sense as it’s name says (this is obviously not a case for the moment). I'd suggest a bit more simplified scenario for this:
>>
>> 1. We do a revision of all “@Experimental" annotation uses now. For the code (IOs/libs/etc) that we 100% know that has been used in production for a long time with current stable API, we just take this annotation away since it’s no needed anymore.
>>
>> 2. For the code, that is left after p.1, we leave as “@Experimental”, wait for N releases (N=3 ?) and then take it away if there are no breaking changes happened. We may want to add new argument for “@Experimental” to keep track release number when it was added.
>>
>> 3. We would need to have a regular “Experimental annotation report” (like we have for dependencies) sending to dev@ and it will allow us to track new and out-dated annotation.
>>
>> 4. And on course we update contributors documentation about that.
>>
>> Idea of graduation by voting seems a bit complicated - for me it means that all added new user APIs should go through this process and I’m afraid that in the end, we potentially can be overwhelmed with number of such polls. I think that several releases of maturation and final decision of the person(2) responsible for the component should be enough.
>>
>> In the same time, I like the Andrew’s idea about checking a breaking changes through external tool. So, it could guarantee us to to remove experimental state without any fear to break API.
>>
>> In case of breaking changes of stable API, that won’t be possible to avoid, we still can use @Deprecated and wait for 3 release to remove (as we already did before). So, having up-to-date @Experimental and  @Deprecated  annotations won’t be confusing for users.
>>
>>
>>
>>
>>
>> On 28 Nov 2019, at 04:48, Kenneth Knowles <ke...@apache.org> wrote:
>>
>>
>>
>> On Wed, Nov 27, 2019 at 1:04 PM Elliotte Rusty Harold <el...@ibiblio.org> wrote:
>>>
>>> On Wed, Nov 27, 2019 at 1:12 PM Kenneth Knowles <ke...@apache.org> wrote:
>>> >
>>>
>>> > *Opt-in*: This is a powerful idea that I think changes everything.
>>> >    - for an experimental new IO, a separate artifact; this way we can also see downloads
>>> >    - for experimental code fragments, add checkState that the relevant experiment is turned on via flags
>>>
>>> To be clear the experimental artifact would have the same group ID and
>>> artifact ID but a different version than the non-experimental
>>> artifacts?  E.g.
>>> org.apache.beam:beam-runners-gcp-gcemd:2.4.0-experimental
>>>
>>> That could work. Changing the artifact ID or the package name would
>>> risk split package issues and diamond dependency problems. We'd still
>>> need to be careful about mixing experimental and non-experimental
>>> artifacts.
>>
>>
>> That's clever! I think using the classifier might be better than a modified version number, e.g. org.apache.beam:beam-io-mydb:2.4.0:experimental
>>
>> My prior idea was much less clever: for any version 2.X there would either be beam-io-mydb-experimental or beam-io-mydb (after graduation) so no problem with a split package. There would be no "same artifact id" concern.
>>
>> Your idea would allow us to ship two variants of the library, if we developed the tooling for it. I think doing the stripping of experimental bits and ensuring they both compile might be tricky unless we are stripping rather disjoint piece of the library.
>>
>> Kenn
>>
>>